Designing a Scalable Processor Array for Recurrent Computations

Size: px
Start display at page:

Download "Designing a Scalable Processor Array for Recurrent Computations"

Transcription

1 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 Desgnng a Scalable Pocesso Aay fo Recuent Coputatons Kua N. Ganapathy, Benjan W. Wah, Fellow, IEEE, and Chen-We L Abstact In ths pape, we study the desgn of a copocesso (CoP) to execute effcently ecusve algoths wth unfo dependences. Ou desgn s based on two objectves: ) fxed bandwdth to an eoy (MM) and ) scalablty to hghe pefoance wthout nceasng MM bandwdth. Ou CoP has an access unt (AU) oganzed as ultple queues, a pocesso aay (PA) wth egulaly connected pocessng eleents (PEs), and nput/output netwoks fo data outng. Ou desgn s unque because t addesses nput/output bottleneck and scalablty, two of the ost potant ssues n ntegatng pocesso aays n cuent systes. To allow pocesso aays to be wdely usable, they ust be scalable to hgh pefoance wth lttle o no pact on the suppotng eoy syste. The use of ultple queues n AU also elnates the use of explct data addesses, theeby splfyng the desgn of the contol poga. We pesent a appng algoth that pattons a data dependence gaph (DG) of an applcaton nto egula blocks, sequences the blocks though AU, and schedules the executon of the blocks, one at a te, on PA. We show that ou appng pocedue nzes the aount of councaton between blocks n the pattoned DG, and sequences the blocks though AU to educe the councaton between AU and MM. Usng the atx-poduct and tanstve-closue applcatons, we study desgn tade-offs nvolvng ) dvson of a fxed chp aea between PA and AU, and ) poveents n speedup wth espect to nceases n chp aea. Ou esults show, fo a fxed chp aea, ) that thee s lttle degadaton n thoughput n usng a lnea PA as copaed to a PA oganzed as a squae esh, and ) that the desgn s not senstve to the dvson of chp aea between PA and AU. We futhe show that, fo a fxed thoughput, thee s an nvese squae oot elatonshp between speedup and total chp aea. Ou study deonstates the feasblty of a low-cost, eoy bandwdth-lted, and scalable copocesso syste fo evaluatng ecuent algoths wth unfo dependences. Index Tes Access unt, affne dependences, aea ndex, clock-ate educton, dependence gaph, eoy bandwdth, ultesh gaph, pattonng, pocesso aay, schedulng, unfo dependences. INTRODUCTION P ROCESSOR aays have been desgned fo effcent coputaton of ecuences. In ass quanttes, the poducton of such fxed-functon aays s anageable and econocal. Howeve, when a sngle pocesso aay s to be used fo a new applcaton, a anufactue wll have to take the long and costly pocess of desgnng and fabcatng the applcaton-specfc ntegated chp. Although the cost of such desgns has deceased n ecent yeas, budget constants have otvated a tend away fo custo hadwae developent except n cases n whch the pefoance equed justfes the cost of developng such specalzed hadwae. Consequently, geneal-pupose o pogaable pocesso-aay achtectues ae oe attactve altenatves. Ths pape dscusses the desgn of a paallel VLSI copocesso (CoP) that s pogaable fo coputng ecuences. The CoP ntefaces wth a font-end host achne that s esponsble fo data nput and contol. The oveall equeents on the CoP ae: Fxed data bandwdth to an eoy fo easy ntegaton nto exstng systes; K.N. Ganapathy s wth the Telecouncatons Dvson, Rockwell Intenatonal, Newpot Beach, CA E-al: kua@nb.ockwell.co. B.W. Wah and C.-W. L ae wth the Coodnated Scence Laboatoy, Unvesty of Illnos, Ubana-Chapagn, Ubana, IL 680. E-al: {wah, cwl}@anp.chc.uuc.edu Manuscpt eceved 9 Aug. 99; evsed July 996. Fo nfoaton on obtanng epnts of ths atcle, please send e-al to: tanspds@copute.og, and efeence IEEECS Log Nube Scalablty, o ablty to ncease pefoance by addng pocesso/eoy odules wthout nceasng the bandwdth to an eoy; and Ablty to execute effcently the class of algoths wth unfo dependences. The last equeent ndcates the doan of applcatons tageted fo CoPs. These nclude nested-loop algoths wth unfo dependences, whch nvolve unfo ecuences (UREs), and unfozed affne ecuences (Secton ). Such loops ae found fequently n sgnal and age pocessng, scentfc coputatons, atx and lnea algeba coputatons, optzaton, dgtal councatons, and contol. Although an applcaton-specfc desgn fo each applcaton would esult n hghe pefoance, we have chosen a coon etagetable achtectue that can be eused fo a nube of applcatons. The poposed achtectue can be vsualzed as a copocesso to wokstatons o as a VLSI ppelne n supecoputes fo loop coputatons, sla to vecto functonal unts fo executng vecto nstuctons. Ou appoach s to desgn a cobnaton of hadwae achtectues and softwae appng ethods (Secton ) unde fxed an-eoy bandwdth (Secton.). The tade-offs we consde nclude the followng (Secton 5). Fo a gven thoughput, educe clock fequency by nceasng aea; Fo a gven clock ate, ncease thoughput by nceasng aea /97/$ IEEE J:\PRODUCTION\TPDS\- egulapap KSM 7,6 0//97 8:6 AM /

2 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 Fg.. Copocesso achtectue poposed to solve a class of algoths odeled by unfo ecuences. Thee have been nueous effots to develop genealpupose systolc coputes n the past ten yeas. These nclude Wap and Wap [], Matx- [8], SLAPP [6], edu-gan achtectue fo age and sgnal pocessng [7], VATA [0], pseudosystolc lnea aay [], [], and a host of othes. Howeve, any of these desgns have poweful pocessos wth lage local eoes, and hghbandwdth data nteconnect between pocessos and the host/global eoy. Hence, the cost of such systes s vey hgh. In ou appoach, we a at desgnng a sple, esouce-lted VLSI aay pocesso that can be attached to any standad sngle-pot an eoy (wth fxed bandwdth and long latences) that esults n a syste wth low cost and acceptable pefoance fo ou taget applcatons. Ths s n contast to ost exstng achtectues n whch eoy bandwdth s nceased popotonally as the syste s scaled fo hghe pefoance. Recently, thee have been effots to develop a systeatc appoach to patton and ap atx algoths on eshconnected pocesso aays [], [], [0]. Ou appoach dffes n tyng to extact axu data e-use unde the constant of a low-bandwdth nteconnect to an eoy n ou CoP. Ths eseach also dffes fo tadtonal wok on appng/pattonng data on fxed-sze pocesso aays [], [9], [5], [], [5], [9], [7], [5] by assung only lted stoage space n the pocesso aay and by consdeng the effect of an-eoy latency due to lowbandwdth nteconnecton to the an eoy. The est of ths pape s oganzed as follows. We fst pesent ou poposed aay pocesso (Secton ), descbe the taget algoths (Secton ) and the appng and pattonng technques (Secton ), evaluate and dscuss esults usng atx poduct and tanstve closue as exaple applcatons (Secton 5), and dscuss the pact of clock-ate educton on oveall syste pefoance (Secton 6). COPROCESSOR ARCHITECTURE AND RATIONALE The achtectue studed n ths pape conssts of the followng coponents (Fg. ): An extenal an eoy (MM) fo stong nput and output data, A dedcated pocesso aay (PA) fo executng the coputatons of a gven algoth, and An access-unt (AU) fo stealnng the flow of data between PA and MM. Secton. descbes the ndvdual coponents n oe detal. Secton. copaes ou achtectue to exstng achtectues.. Copocesso Achtectue Man Meoy: (MM) The an eoy n the achtectue s a standad (usually nteleaved) eoy fo stong data nvolved n the coputatons. The data n MM ae accessed by supplyng a stea of eoy addesses to the addess decode n MM. The bandwdth of accesses s usually constaned by a sall nube of eoy pots, and addtonal latency s ncued n decodng addesses. The assupton of accessng data fo standad MM wth low bandwdth splfes the ntegaton of CoPs to conventonal hosts. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

3 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS Fg.. Concatenaton of CoPs to solve lage pobles o to pove speedup. Note that we have abstacted the nteface between CoP and MM as a pot whose pefoance s aggegated by ts bandwdth. In pleentng the CoP achtectue, the specfc eoy nteface, such as Rabus, and ts handshakng ust be consdeed and be desgned n hadwae. Access-Unt: (AU) To ovecoe the bandwdth ltaton and to ask the long eoy latency of a shaed MM, we nclude n ou achtectue a buffe eoy called accessunt to stoe data accessed fo MM and to feed PA at a uch hghe ate. AU has a fxed aount of stoage to buffe the nteedate data that cannot be held n PA so that these data can be ecculated to PA to educe the deand on MM. The stoage n AU s oganzed as FIFO queues, and explct eoy addesses (except fo queue nubes) ae not used. In each cycle, the data pesent at the head of soe of the queues ae sent to PA o to MM though the output netwok, and data fo PA o MM ae sent to the tal of the queues. In addton, AU can ) pefetch data fo MM nto ts queues to hde the eoy latency, ) shft the dffeent queues at dffeent ates to eode the data elatve to each othe, ) pefo ndect addessng of MM n whch a sequence of addesses obtaned fo MM s subsequently used to access data n MM. Ths ndect addessng coesponds to scatte-gathe nstuctons n vecto coputes. The use of queues n AU saves addess bts, splfes addess decodng, and educes slcon aea n pleentaton. Futhe, the queue stuctue pets AU sze to be scaled wthout changng the nube of addess bts, nube of pots to MM, and nube of pots to PA. Pocesso Aay: (PA) The pocesso aay s a egulaly connected aay of pocessng eleents (PEs). Each PE n PA has a copogaed contol that govens ts opeatons. The contol specfes the actons to be pefoed n a PE on ecevng data fo ts neghbos. Fo exaple, to solve a atx poduct expessed as a D ecuence, all PEs pefo an nne-poduct coputaton c = c + a b, whee c, a, b ae eleents of atces C, A, and B, espectvely. The choce of the PE confguaton s dctated by the equeent of scalablty. Fo tue hadwae scalablty, the nube of eoy pots, nube of pots to PA, and nube of addess bts should be ndependent of the sze of the achtectue and the sze of the poble beng solved. Ths ples that PA should be I/O bounded wth a constant nube of pots. A possble confguaton, then, s a lnea aay of PEs wth two bounday PEs that councate wth the heads of the queues n AU. (A D esh confguaton s less desable as the nube of pepheal PEs s popotonal to the squae oot of the total nube of PEs.) A lnea aay of PEs geneally has lowe thoughput than a squae aay wth the sae nube of PEs fo executng nested loops wth unfo dependences [9], [], [7], []. Howeve, fo the algoths wth ultesh dependence gaph (DG) (see Secton ) and when MM bandwdth s fxed, we show n Secton 5. that ths degadaton s sall and that speedup s bounded by the tes to ead nput atces and to wte output atces. Usng a lnea PA confguaton, Fg. shows a concatenaton of seveal CoPs that can be used to pove the pefoance of solvng a gven poble o to solve a poble of lage sze. The FIFO queues of AU can be splt ove ultple chps when the ente buffe eoy equed fo a lage nube of PEs cannot ft n a sngle chp. Ths esults n a lnea aay of PEs wth a lnea aay of AUs to buffe data. Whle each ndvdual CoP can have an optal dvson of chp aea between PA and AU (as shown n Secton 5.) to fully ask the MM latency, the lnea concatenaton of CoPs ay esult n suboptal dvson of the total slcon aea acoss ultple chps and lowe pefoance. Howeve, we show n Secton 5. that the pefoance of a lnea PA s athe nsenstve to AU sze, esultng n a odula expanson of the syste wthout sgnfcant degadaton n pefoance. Fo a gven PA confguaton, an potant ssue to be studed s the allocaton of a fxed chp aea between PA and AU. Ths ssue, to be studed n Secton 5., has not been addessed n othe achtectues based on eoy queues. In shot, ou achtectue s chaactezed by the followng paaetes: #PE: nube of PEs, b: local Meoy n each PE, B : ato of bandwdths between AU-MM and AU-PA, p: sze of AU n blocks, J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

4 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 Fg.. Relatve copason of CoP to othe achtectues wth lted eoy bandwdth. A pe : chp aea fo pleentng a PE n unt of eoy wods (A pe of 00 eans that each PE occupes an aea equvalent to 00 wods of eoy), Topology of the PEs: lnea o squae. Ou pocedue fo desgnng CoPs has two ajo steps: ) defne a odel of the achtectue n tes of the paaetes defned above, and ) ave at the fnal achtectue by analyzng cost and pefoance tade-offs.. Copasons to Exstng Achtectues In ths secton, we pefo a qualtatve copason of ou CoP to othe exstng achtectues wth lted MM bandwdth. The attbutes used ae: ) hadwae cost, ) softwae cost, ) genealty, and ) hadwae scalablty. The last attbute easues the ablty to ncease pefoance by addng oe pocessng unts to the syste n a odula fashon. The achtectues consdeed n ou copason ae: Systolc Geneal-Pupose Pocessos (SGPP). These ae pogaable geneal-pupose PAs that have been bult fo a class of applcatons. Exaples nclude Wap (n systolc ode) [], SLAPP [6], Matx- [8], and edu-gan age pocessng achtectues [7]. Pattoned Systolc Aays (PSA). These nclude eseach effots aed at desgnng fxed-sze systolc aays fo solvng lage pobles [], [9], [5], [], [5], [9], [7], [], [5], []. Systolc Aays (SA). These efe to tadtonal, algoth-specfc, poble-sze-dependent systolc aays. Copocesso (CoP). Ou poposed copocesso. We do not copae ou CoP to coecally avalable shaed-eoy ultpocessos (SMM) and dstbutedeoy ultcoputes (DMM), as these systes have the eoy bandwdth tuned to a specfc confguaton. An SMM s attached though a dedcated nteconnecton netwok to a set of eoy odules. As an SMM s scaled to hghe pefoance, the nube of eoy odules and the sze of the netwok ust also be nceased. Lkewse, a DMM wth local eoy n each pocesso also has nceased eoy szes and bandwdths, as the syste s scaled to hghe pefoance. Fg. copaes the dffeent achtectues on the chosen attbutes. Note that the fgue only shows the elatve odeng between dffeent achtectues and does not epesent the actual pefoance dffeence. In tes of hadwae cost (o, equvalently, hadwae coplexty), SA s the splest wth SGPP beng the ost expensve. CoP has about the sae hadwae coplexty as PSA wth an extenal buffe fo ecculatng data. In tes of softwae cost o pogaablty, SA s the lowest, as thee s no pogang effot once the hadwae s desgned. Pogaablty of CoP s sla to that of SGPP: Both accept hgh-level sequental pogas as nput and use a cople to ap the executons. Fo genealty, SA s the ost estcted n tes of the applcaton and the poble sze t s desgned fo. PSA elaxes the estcton on applcaton sze but s stll ted to an applcaton. CoP s oe geneal than PSA because ultple applcatons can be apped on the sae hadwae achtectue. SGPP s the ost geneal as t can solve sla applcaton pobles (odeled as UREs) as CoP, as well as pobles solvable on a geneal pupose DMM. In tes of hadwae scalablty, SA s poble-sze dependent and, hence, not scalable. PSA ay be scalable f ts achtectue s desgned popely. CoP and SGPP ae both scalable by addng new PEs and local eoy (AU n the case of CoP) to an exstng syste. To suaze, ou poposed CoP has low hadwae cost, s pogaable to solve a class of applcaton pobles, s scalable to a lage syste by connectng addtonal PEs and AUs, and eles on a cople to ap coputatons n ode to effectvely explot paallels. Its desgn epesents a good tade-off aong the fou attbutes to solve the class of applcaton pobles consdeed n ths pape. In Sectons and 5, we pesent ethods to fnd the optal pattonng of chp aea between PA and AU. TARGET ALGORITHMS AND TRANSFORMATIONS In ths secton, we defne the set of applcatons that can be handled by ou poposed CoPs. The applcaton doan s the set of nested-loop algoths that can be odeled as ultesh gaphs (MMGs) [], []. These can be used to epesent the class of unfo dependences, and, f the ecuence s affne but not unfo, to unfoze the dependences. Ou statng ponts n ou appng pocess descbed n Secton ae, theefoe, ultesh gaphs of (unfozed) UREs. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

5 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS 5 Fg.. D MMGs afte applyng Moeno and Lang s egulazaton pocedue [], []. (a) Poduct of two N-by-N atces. The MMG s an N N N cube wth unt vectos along the axes as dependence vectos. (b) MMG fo the tanstve closue of a -by- atx. Fo nstance, the followng Fotan-lke nested loops can be epesented by a syste of ecuence equatons. DO c j = l, u ; j = l, u ; L ; j = l, u S J M Sbd J ; END d ; n n n T The colun vecto J = [ j, j, K, j n ] s the ndex vecto (o ndex pont). Su ( J ), u =, K, t, ae t assgnent stateents n teaton J havng the followng fo: e d j e d j e d j Z y J = f Z x J, K, Z x J, () whee. Nested-loop stuctues have a dect coespondence wth ecuence equatons that povde a succnct atheatcal epesentaton fo the. An n-densonal ecuence equaton s equvalent to a set of n nested loops, whose loop-caed dependences coespond to dependences n the ecuence equaton. Recuences can be classfed as unfo o nonunfo based on the natue of the dependences [6]. A ecuence equaton, Z( p) = f[ Z( q ), Z( q ), K, X( q )], s called unfo (URE) f q = p + d, whee d s a constant n- densonal vecto ndependent of p and q. A ecuence equaton s called affne o lnea (LRE) f q = A p + b, whee A s a constant-coeffcent n n atx, and b s an n-densonal vecto. A ecuence equaton s called nonlnea f q p = c( ), whee c s a nonlnea functon. In the eande of ths pape, we use the dependence gaph (DG) of a nested loop as a gaphcal tool to descbe the pattonng and appng pocedues. The DG of an n- nested loop algoth s defned ove an n-densonal ntege lattce doan, whee nodes coespond to opeatons nsde the nested loops, and acs coespond to loopcaed dependences. In ths wok, we estct ouselves to ecuences wth unfo dependences, whch nvolve unfo ecuences h and unfozed affne ecuences. Hence, only stuctual nfoaton of the algoth,.e., ndex set and dependence atx, s needed. The followng exaples show the ecuences and nested-loop epesentatons of the atx-poduct and tanstve-closue algoths. EXAMPLE. The followng nested-loop poga s the unfozed veson of the atx-ultplcaton algoth. DO END c =, N ; j =, N ; k =, Nh Ab, j, = Ab, j -, kg Bb, j, = Bb -, j, kg Cb, j, kg = Cb, j, k - g + Ab, j, kg Bb, j, kg Intutvely, data A,k ae ppelned along the j axs, and B k,j, along the axs. EXAMPLE. Consde the followng thee-densonal (D) ecuence wth n =, = 5: =(k,, j) = X(k, )Y(j, k) + =(k -, +, j + ) + =(k -, +, j) + =(k -,, j + ). () Afte ppelnng and unfozaton, () becoes =(k,, j) = ;(k,, j - )<(k, -, j) + =(k -, +, j + ) + =(k -, +, j) + =(k -,, j + ). () The esultng set of dependences ae equvalent to those of Floyd-Washall algoth to copute the tanstve closue of an N N atx. Befoe appng a ecuence on a CoP, we tansfo ts DG usng Moeno and Lang s unfozaton o egulazaton pocedue [], [] to convet the DG to a egula ultdensonal esh gaph called ultesh gaph (MMG). Although ths pocedue ay ntoduce unnecessay opeatons n pocessng the ecuence, t allows the pattonng algoth to patton the MMG nto blocks of unfo szes and shapes, and splfes the appng of these blocks n AU and the schedulng on PA (whch n tun educes the aount of contol nfoaton needed n CoP). The egulazaton pocedue convets a fully paallel DG of a gven atx algoth to a egula MMG by pe- J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 5 / 7

6 6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 fong tansfoatons to eove boadcastng, bdectonal flow of data, and egula dependences. Infoally, a atx algoth s descbed ecusvely by an oute loop wth a loop body of vecto, scala, and othe atx algoths. The estcton s that each node of the DG can have at ost thee opeands. Ths popety allows tansfoatons to egulaze the DG to an MMG wth a D cube stuctue. Fg. a shows the cubcal-esh DG fo the atx-poduct algoth n Exaple. Fo the tanstve-closue poble descbed n Exaple, we use Moeno and Lang s egulazaton pocedue [] to obtan a D MMG wth a cubcal stuctue (Fg. b). The dakened nodes n the fgue ae delay nodes, whch ae added n the egulazaton pocedue to obtan a cubcal stuctue. We have chosen such a D MMG wth delay nodes as t pesents a unfo and sple ethod of executng the MMG. Fo an algoth-specfc desgn, the delay nodes would contbute to a sgnfcant poton of the executon te; hence, the MMG stuctue was not used n the desgn of systolc pocessos fo coputng tanstve closues. Fo a CoP wth a low bandwdth pot to an extenal MM, the egula D MMG stuctue wth addtonal delay nodes s justfed as t splfes the appng (usng fewe contol bts) and access pattens to MM. Hence, n studyng the tanstve closue poble n ths pape, we ap the coespondng D MMG to ou CoP. We have chosen not to stat fo the ecuence defnton of an applcaton poble and deve ts coespondng MMG befoe appng t to a CoP. It s not necessay to show such a step because the technque of devng MMGs s well undestood n the lteatue [], []. Although one MMG ay be bette than anothe when apped to a CoP, ou focus n ths pape s to llustate how ou appng technque can be appled to ap MMGs. As ths pocess can be autoated, uses can apply t n conjuncton wth the pocess of devng MMGs when gven a new applcaton poble n the futue. A seconday eason fo not showng the devaton of MMGs s due to space ltaton. MAPPING PROCESS In ths secton, we descbe ou ethod of appng the hgh-level loop specfcaton of an algoth specfed as MMGs to ou taget CoP. The ente appng pocess, depcted n Fg. 5, can be boken down nto the followng steps: ) Patton the MMG nto blocks, whee data n a block can be pocessed by PA n one pass wthout futhe pattonng (Secton.). ) Schedule the executon of a sngle block on PA usng GPM (Genealzed Paaete Method, Secton.). Ths step s to get an optal desgn of PA, n tes of coputaton te o othe objectves and unde vaous constants. The desgn paaetes detened n ths step ae the nube of PEs n PA, local eoy n each PE, and I/O bandwdth of PA. ) Sequence the blocks though AU;.e., detene whch blocks wll occupy AU at each te step. Fo a foal defnton, see Chapte 5 of []. Fg. 5. Mappng the MMG of an algoth to a CoP achtectue. (Secton.). The goal of ths step s to establsh a balanced dvson of chp aea between PA and AU n ode to nze the penalty of MM accesses. In ths step, AU sze s fxed. ) Geneate addess and code sequences fo Steps and. Ths step, although well-defned and well undestood, s not tval n pleentaton. The developent of the code geneato s ongong at ths te. The last step s well-defned once pattonng, sequencng, and schedulng ae done. Hence, n the est of ths secton, we descbe only the fst thee steps of the appng pocess. In addton to the sybols defned n Secton., we defne the followng sybols: N: sze of each denson of a gven MMG, : sze of each denson of a paallelepped (block) afte pattonng (fo the atx-poduct and tanstve-closue pobles, the blocks ae cubcal of sze ), n: nube of densons of the ecuence equaton, and p: sze of AU n unt of blocks (whee the p blocks ae aanged n a p p tle).. Pattonng an MMG The objectve n ths step of the appng pocess s to patton a gven MMG nto nonovelappng blocks of a axu sze that can be pocessed by PA n one pass. Ths step s necessay because PA has only a lted nube of PEs and I/O pots. In ou pattonng pocedue, we fst defne a pattonng atx whose coluns epesent hypeplanes that J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 6 / 7

7 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS 7 patton an MMG nto blocks. We then pesent ou pattonng pocedue and deonstate ts popetes. (Fo bevty, poofs ae pesented n Appendx A.) Fnally, we llustate the pocedue by two exaples. Methods fo fndng ndependent pattons, n whch the councaton between blocks s zeo, have been poposed befoe [6], [8]. Howeve, when the ognal DG has only one connected coponent (fo algoths consdeed n ths pape), ndependent pattonng esults n only one block,.e., the ente DG. A technque called supenode pattonng has been poposed [] to patton nodes n a DG that depend on each othe, and educe councaton between supenodes by popagatng esults nsde a supenode. Howeve, a systeatc way to fnd such pattons s not pesented. Pattonng algoths have also been studed n the context of DMMs. Hee, the goal s to patton data n ode to axze paallels and to educe data councaton aong pocessos. Snce the goal of ou pattonng algoth s dffeent and as to nze the aount of data tansfeed between AU and MM, data pattonng algoths developed fo DMMs can only be used as heustcs n CoPs. In othe cases, DMM pattonng schees have been developed fo oe estcted cases and ae not applcable hee. Fo nstance, Kulkan et al. [8] poposed a unodula loop tansfoaton fo pattonng an teaton space n ode to ncease paallels, acheve load balance, and nze ntepatton councatons. Howeve, the schee cannot be appled n CoPs because t can only patton dependence gaphs n D teaton space o doubly nested loops along one decton. It s desable to have a geneal schee that can handle hghe densonal nested loops and can patton along ultple dectons n the teaton space. Ou appoach to pattonng n ths pape s sla to that of Moldovan and Fotes []. Fo an n-d algoth, we fnd n-ndependent hypeplanes to patton a DG nto blocks. Hence, ou blocks ae paalleleppeds, and the shape can be descbed by a pattonng atx P consstng of n pattonng vectos. P = p p p L p n. () Snce we ae dealng wth unfo dependence algoths, we estct the szes and shapes of all the blocks to be the sae. Ths splfes the addess and code geneaton fo PEs. The eason fo choosng exactly n pattonng vectos n an n-d doan s that blocks wll not be egula f we have othe than n pattonng hypeplanes. EXAMPLE. In Fg. 6, the ndex set s a D plane, and the blocks foed by the hypeplanes ae noal to vectos p p,, and p. Choosng p o p alone esults n unequal o unbounded blocks. Slaly, choosng p, p, and p esults n unequal blocks. Choosng p and p fos blocks of equal szes except at the boundaes. DEFINITION. Gven an n-d DG G of an algoth and a pattonng atx P, G b, the block-level DG of G, s a DG n n-d space, whee Nodes n G b coespond to blocks of G whch conssts of Fg. 6. Hypeplane pattonng of dependence gaphs. The ndex set s a D plane. The vectos p, p, and p coespond to a faly of all of the nodes wthn the paallelepped defned by the n pattonng hypeplanes n P, and Edges n G b coespond to dependence vectos cossng the hypeplanes between adjacent blocks n G. The followng lea pesents the condtons fo choosng a vald pattonng atx P. (The poof s shown n Appendx A..) LEMMA. The pattonng of a DG by pattonng atx P s vald f and only f P t D 0 o P t D 0, whee D s the dependency atx. Next, we pesent a pocedue fo choosng a good pattonng vecto that esults n a vey sall aount of councaton between adjacent dependent blocks. PROCEDURE. Let g = ank(d), whee D = [ d d K d k ] s the dependency atx. Hence, only g of the k dependence vectos ae lnealy ndependent. Wthout loss of genealty, assue that the fst g coluns ae lnealy ndependent, and let D = [ d d K d g ] be an n g atx consstng of the g lnealy ndependent vectos of D. Let D be an n g - atx deved fo D by doppng the th colun vecto;.e., D = [ dk d - d + K d g ]. The nube of pattonng hypeplanes needed to patton the DG s g (snce g = ank(d)). Hence, atx P = [ p K p g ] s chosen such that p s gven by p D g p d t t = 0, whee and > 0. (5) egulaly spaced hypeplanes noal to the ndex set. They detene only the oentaton of the hypeplanes,.e., block shape, and not the spacng aong the. The dea s to choose p as the bass vecto of the left null space of atx D and nvet the sgn of the eleents of p f p d t < 0. Hence, by constucton, the pattonng atx P s feasble. COROLLARY. If the coluns of atx D fo a noal bass; (.e., d d t j = 0, π j), then P = D poduces a vald pattonng atx. (See Appendx A. fo the poof.) The followng exaples llustate the pattonng schee and the fact that egulazaton of a DG nto an MMG leads to a unfo way of pattonng the nput algoth. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 7 / 7

8 8 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 Fg. 7. (a) Pattoned DG fo coputng atx poducts, whee pattonng atx P s equal to dependence atx D. (b) Pattoned DG fo coputng tanstve closues, whee pattonng atx P s equal to the dentty atx I (the fgue s dawn fo N = and = ). EXAMPLE. Fo a D atx poduct, the dependency atx s 0 0O 0 0P 0 0. D = L N M M Accodng to the pocedue above, the pattonng atx P = D s feasble, as D D = I > 0. (I s a theeby-thee dentty atx.) Fg. 7a shows the pattonng of the DG by the pattonng atx P = D. The block-level DG s a cubcal D esh of sze t QP N e j. Fo the MMG of the tanstve-closue poble, the dependence vectos ae (, 0, 0) t, (0,, 0) t, (0, 0, ) t. Assung that P s equal to I, Fg. 7b shows the blocklevel MMG of sze N (N =, = + N + n ths fgue). To ave at ths esult, note that each plane n Fg. b s a squae aay of N N nodes that ae padded wth one exta ow/colun of delay nodes fo egulazaton. Hence, each plane has (N + ) (N + ) nodes, and thee ae N such planes n the DG. When the DG s pattoned nto cubcal blocks of sze, thee wll be N N nodes n the z-decton n the block-level DG and N + nodes n the y-decton. Also, snce the DG s staggeed fo one plane to the next as n Fg. b, goupng such planes togethe N + wll lead to nodes n the x-decton. The followng lea (stated wthout poof [0]) shows that the above choce of P s good n the sense that t nzes the aount of data councaton between blocks. LEMMA. The choce of the pattonng atx P by Pocedue esults n the nu aount of councaton between blocks n the pattoned DG. FH I K (. ) The coplexty of fndng atx P s k g O n g FH I K thee ae k g ways of choosng g ndependent coluns of D, and O(n.g ) s the cost of fndng a null-space vecto of an n (g - ) atx. as In geneal, the sze of a block s b b b g, whee b, =,, g, and s chosen such that the ente block can be pocessed by PA n a sngle pass. The choce s based on the sze of PA and the local eoy n each PE. In the atxpoduct and tanstve-closue exaples studed n ths pape, g = and b = b = b =.. Executng a Block n PA The goal n ths step s to ap a sngle block onto PA so that t can be executed n a sngle pass. Hee, we use the Genealzed Paaete Method (GPM) we have developed befoe [], [], [9] to detene the data dstbuton of nputs of the block chosen fo executon;.e., whch data should be nput nto the bounday PEs at each te step. (Detals of GPM ae not ctcal n ths pape and ae not pesented hee.) The objectve when we apply GPM s to axze the utlzaton of PA. If #PE, T c, and T seq denote, espectvely, the nube of PEs n PA, copleton te of all of the blocks, and seal te to copute the DG, then utlzaton U s gven by Tseq U =. (6) # PE T Ths ndcates that, fo a gven algoth wth fxed T seq, axzng U s equvalent to nzng #PE T c. Ths objectve tes to educe the coputaton te of each block and ncease the ovelap between consecutve blocks n ode to educe the load/dan penaltes of the blocks.. Sequencng Blocks though AU When the ecuence s coputed by a CoP, p blocks fo MM ae ntally loaded nto AU, and PA begns executng these p blocks. As the executon poceeds, new blocks ae fetched fo MM nto AU, and soe of the exstng blocks n AU ae wtten back to MM. As AU s of lted sze, a block of data wll have to be fetched ultple tes fo MM, and ou goal n ths step s to decde whch blocks wll be fetched nto AU as executon poceeds such that the aount of eoy taffc s nzed. Pevous woks on loop tansfoatons o cople optzatons to pove data localty of loop nests, e.g., [], [], taget at loops of geneal dependence stuctues. These contbute necessay and suffcent condtons of vald loop c J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 8 / 7

9 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS 9 tansfoatons, but ae only heustc tansfoatons that a to educe the taffc n a eoy heachy. In ths study, nstead of studyng applcatons wth geneal dependence stuctues, ou sequencng pocedue descbed below s fo coputatons wth dependence stuctues of MMGs. Despte the oe lted focus of ou wok, we show at the end of ths secton that ou sequencng pocedue fo MMGs s asyptotcally optal n nzng MM taffc. The blocks to esde n AU ust be chosen to educe the taffc between MM and AU; equvalently, data euse should be axzed fo the blocks n AU. We can thnk of AU fong a stoage wndow o tle ove the block-level DG of the applcaton algoth. AU stoes all the data needed to copute the blocks n the stoage wndow, and the output of ths phase s to descbe how the stoage wndow wll ove ove the block-level DG n a nonovelapped fashon. (Othewse, soe coputatons wll be edundant.) The sequencng pocedue s descbed below fo a block-level DG n the fo of an n-densonal MMG wth dentty dependence atx;.e., D = I n. Let the sze of the block-level DG be V = N N N n, whee N, =, º, n, s the nube of nodes along decton R S T d = K K - n- U V W The stoage wndow s an (n - )-D tle n the n-d DG. The followng pseudocode descbes the oveent of the stoage wndow of p blocks. Wthout loss of genealty, assue that N N º N n (othewse, the DG can be endexed). PROCEDURE. fo = to N step n- p O fo = to N step n- p n- n- fo n = to N c Schedule,, K, n whee Schedule(, º, n ) schedules all the p blocks n the 6 n7-8 n- p K n- p (n - )-D stoage wndow at node (, º, n ) of the block-level DG to be bought nto AU. The stoage wndow s on a plane pependcula to vecto d n n the block-level DG, and oves along decton d n. The easonng behnd Pocedue s explaned as follows. Consde a geneal URE n an n-d doan wth dependence vectos D = [ d,, K, d d n ]. (If the nube of dependences s lage than n, we consde only the n lnealy ndependent ones.) Note that a set of n lnealy ndependent dependence vectos d d K d,,, n can be conveted to a set of n unt vectos (coespondng to an n-d esh) by a t n h lnea tansfoaton o an appopate bass change. Thus, the UREs ae, n soe sense, equvalent to each othe fo the pont of vew of dependences, and a sequencng schee developed fo n-d MMGs can be extended easly to othe UREs. In the case of geneal UREs, the stoage wndow s an (n - )-D tle foed by the fst (n - ) dependence vectos, and the tle s oved along the eanng vecto d n n the DG. The shape of a doan of a gven URE s also taken nto account n the sequencng schee as follows. Fo a URE defned ove an abtay convex doan, the sequencng pocedue taveses the doan by a set of paallel D lnes. The wdth of a lne coesponds to an (n - )-D stoage tle, and ts decton denotes the oveent of the (n - )-D tle along the nth dependence vecto. A penalty s ncued each te the stoage tle shfts fo the tal of a lne to the head of anothe paallel lne (the te denoted by XY plane except (0, 0, 0) n (0) n the followng secton). The head, tal, and length of these paallel lnes ae dffeent fo dffeent doan shapes. Fo an n-d esh, all of the lnes ae paallel to vecto d n = (0,, 0, ) t, and ae of the sae length N n. Thus, the egulazaton of a gven DG nto an MMG and the decoposton of the block-level DG nto paallel lnes pesent a unfo way of handlng geneal UREs. EXAMPLE 5. Fg. 8a shows the stoage-wndow oveent fo coputng a atx poduct wth N = 6, p = 9, whee each tle s a squae. As the block-level DG s a full D esh, t s pefectly tled wth 6 nodes coveed n tles of sze 9 each. Fg. 8b shows the stoage-wndow oveent fo coputng a tanstve closue wth N =, p =, whee each tle s a squae. As the block-level DG s not a full D esh, the 00 nodes n the blocklevel DG ae ftted nto 6 tles of fou nodes each. Note that Fg. 8b shows two ows of tles; hence, thee ae fou ows of blocks n the z-decton (snce each tle s a block). As each block coesponds to a cubcal esh of nodes of the ognal DG, thee ae N = ows n the ognal DG. Fnally, the block-level DG has N + nodes n the x- decton (Fg. 7b), and the block-level DG s staggeed fo plane to plane. Hence, when tlng t, we ae goupng p nodes n the x and y dectons nto a tle, and thee wll be N + + p - tles n the x- decton. The sze of the stoage tle s chosen to nze the total nube of data accesses ove the lnk to MM. Snce AU sze s constant, the nube of data accesses s popotonal to the peete of the tle. It can be easly seen that the shape. In geneal, the nube of data accesses (whch s the nube of dependence vectos cossng the bounday of a tle) s not popotonal to the peete and s had to chaacteze atheatcally. Fo cubcal DGs (and n-d MMGs), snce each node at the bounday of a tle has an ncong o outgong dependence vecto (the fou cone nodes have both), the nube of accesses s dependent on the peete. It s easy to constuct cases n whch all nodes on the pephey do not lead to any access; hence, the nube of accesses n geneal s not popotonal to the peete. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 9 / 7

10 0 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 Fg. 8. Stoage-wndow oveent n the block-level DG fo coputng (a) atx poduct and (b) tanstve closue. The shaded aeas show the stoage wndows o tles. Fo (a), N = 6, p = 9, and thee ae tles. Fo (b), N =, p =, and thee ae 6 tles. of the tle should be chosen as an equsded (n - )-D paallelepped to nze the total nube of data accesses to MM. Fo the above sequencng schee wth an equsded tle, the nube of data accesses fo MM (o the I/O coplexty denoted by Q) fo a block-level DG n the fo of an n-d esh s Q = a f, (7) n - V V + n- p N whee V s total nube of ndex ponts n the n-d DG, and N n s the sze of the lagest denson. Equaton (7) can be obtaned by consdeng the nputs along the n faces of an n-d MMG. Snce the stoage tle s oved along decton d n (Pocedue ), none of the data on the face pependcula to d n s efetched, and the total nube of accesses fo nputs on ths face s V N n. Fo nputs on the othe faces pependcula to d, =, º, n, each nput data s fetched N p tes fo MM. Hence, the total nube of accesses n- fo the (n - ) faces pependcula to d s V N N ( n-) V( n-). n- p n- p = The followng lea establshes that the above sequencng schee fo n-d MMGs s asyptotcally optal wth espect to the nube of accesses to MM. LEMMA. Fo n-densonal MMGs, Q n e V = W n- S j, whee S s the sze of the lted eoy and Q s the I/O coplexty. (See [] fo poof.) In ths case, S = p, and the nube of accesses fo MM ((7)), due to Pocedue whch has the sae asyptotc coplexty as the lowe bound gven n Lea. (Note that the facto (n - ) s constant fo a specfc MMG and does not affect the coplexty.) Hence, ou sequencng schee s optal n tes of the nube of accesses fo MM. 5 APPLICATIONS: MATRIX PRODUCT AND TRANSITIVE CLOSURE In ths secton, we pesent ou esults n appng algoths descbed by MMGs on a CoP. Ou esults ae based on the atx-poduct and tanstve-closue applcatons. Fgs., 7, and 8 show, espectvely, the ognal MMG, the block-level DG afte pattonng, and the oveent of the tles fo these two pobles. Fo the atx-poduct poble, let C = A B has a D MMG of sze N N N, whee A and B ae two N N atces. As shown n Secton., ths DG s pattoned nto cubcal blocks of sze. The AU holds a p p squae tle of blocks of DG (foed along dependences (, 0, 0) t and (0,, 0) t ), and the tles ove along dependence vecto (0, 0, ) t. Fo the tanstve-closue poble, dakened nodes epesentng delay nodes ae fst added to egulaze the ognal DG to a cubcal stuctue (Fg. b). The D MMG s then pattoned nto cubcal blocks of sze, each of whch can be executed n one pass on PA. The sze of the block-level DG afte pattonng s N + N + N. Note that the block-level DG s dentcal to the ognal DG except fo ts densons. The stoage wndow has sze p p blocks and oves along decton (0,, 0) t as shown n Fg. 8b n ode to nze the total nube of data accesses. These exaples llustate the beneft of devng MMGs fo nested-loop algoths, as we have a unfo way of pattonng an MMG, sequencng blocks of the MMG, and desgnng a PE aay to execute a block of the MMG. 5. Evaluaton Metcs To develop a cost-effectve desgn, we need to evaluate the pefoance of the taget algoths on a gven aount of J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 0 / 7

11 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS slcon chp aea. In ths subsecton, we pesent an abstact odel to estate the aea consued and the copleton te of executon. 5.. Aea Model The total aea occuped by the CoP achtectue s the su of the aeas of the PEs (ncludng nput/output logc), AU, contolle, and nput and output netwoks n AU: Aea = Aea PA + Aea AU + Aea pns + Aea contolle + Aea netwok. (8) The last thee tes can be assued to be constants. Ths assupton s especally tue fo PAs wth lnea confguatons because the nube of pots n such PAs and AUs ae constant. The analyss n the est of ths secton wll stll be vald.) If the values of Aea pns, Aea contolle, and Aea netwok ae sall, as copaed to the values of Aea PA and Aea AU, then the aea ndex contanng the donant tes s gven by AeaIndex = PE Ape + b + p + p, (9) # e j AeaPA AeaAU whee A pe s the aea of a sngle PE n eoy wods (whch captues the pleentaton cost of a PE), and b eflects the local eoy pe PE. Snce n ou exaples AU holds p p blocks of the DG fong a squae n the D MMG (Fg. 8), stoage s needed n AU fo ) p wods of C, ) p wods of A, ) p wods of B, and ) addtonal p wods fo the next set of p p blocks to be bought n fo MM to AU fo futue pocessng. In the followng sectons, we use the splfed aea odel as a cost ndex to analyze the cost-pefoance elaton of the CoP achtectue. It s, howeve, potant to pont out that n soe cases Aea contolle ay be lage and need to be consdeed n coputng the aea. Slaly, Aea netwok and Aea pns ay depend on wod sze, nube of buffes, and nube of paths to PA. The net effect of ncludng these addtonal aeas n ou aea odel s to shft all the cost-pefoance cuves n Fgs. 0 and to the ght. The analyss s sla and wll not be shown. 5.. Model of Copleton Te The total copleton te T copl n PE-cycles s gven by T copl F H p K N = axg pt J,, B block p - MM F HG F G IF KHG f p KXY N, p p + - BMM p KJ + + BMM XY plane except block( 0, 0, 0) + ax - H p, p t B block MM K a b block( N, N, N) g a I I J f I KJ block( 0, 0, 0) (0) whee B MM s the MM-AU bandwdth n wods pe PA clock tck, t block s the nonovelapped te (total copleton te potons of the load/dan tes ovelapped by the followng/pecedng blocks) taken to execute one block of DG by PA, K(N, ) s the nube of blocks of the DG o the nube of nodes n the block-level DG, and K XY (N, ) s the nube of nodes n the pojecton of the block-level DG to the XY plane paallel to the stoage tle of AU;.e., K XY (N, ) s the nube of tes the stoage tle has to change decton. The te XY plane except block(0, 0, 0) n (0) eans stoage tles that can be fetched wthout shftng fo the tal of a fetch decton to the head of anothe paallel fetch decton. The fst te n (0) s the donant te and s the poduct of the te taken fo each wndow d p p n AU and the nube of wndows ove DG. Fo each wndow of sze p p, p s the te t takes to fetch the B MM eleents needed fo the next wndow, and pt block s the te to pocess p blocks n the cuent wndow. The second te n (0) odels the addtonal te equed wheneve the wndow changes decton, whch nvolves wtng and eadng p eleents of the output atx. The thd te s the ntal latency to load the data coespondng to p p blocks n the stoage wndow. Fo a p p stoage wndow, thee ae p eleents of the esult atx (coespondng to the aea of the tle) and p eleents of the nput atx (coespondng to half of the peete). The fnal te s the addtonal te ove the te pt block fo the esults of the fnal stoage tle to be wtten back fo AU to MM. Note that thee ae p eleents of the output atx to be wtten back to MM, and t block s the ealest te afte the stat of the last stoage wndow when the fst eleents ae avalable to be wtten back. EXAMPLE 6. Fo the atx-poduct poble, as shown n Fg. 8a, K K N, XY a a N, f F L = HG M f F L = HG M N N Fo the tanstve-closue poble wth D MMG, as shown n Fg. 8b, K XY a a f K N, = f L M L M OI PKJ OI PKJ O L M O P L M O P N + N + P N + N N, =. O P L M 5.. AU Sze In ode to have effcent pocessng, AU ust be lage enough to ask MM latency fully;.e., copletely ovelap. Fo any block of DG, the pocessng te ncludes the load, dan, and coputaton tes of the block on PA. In GPM [], [9], block schedulng s done to ovelap consecutve blocks enteng PA, theeby educng the effectve load, dan (and aybe the coputaton te) of a block. The nonovelapped te fo a block efes to those potons of the load, dan, and coputaton tes that ae not asked by successve blocks. N O P J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

12 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 the loadng of the nputs of the next stoage wndow wth the pocessng of the cuent one. Theefoe, p pt p B block B t MM F f H G I MM blockk and AU sze s coputed usng (9) fo ths value of p. J, () EXAMPLE 7. Let N = 5, = 8, and B MM = /5 (fve cycles pe wod). The nube of pocessos, #PE = = 6. Fo a squae aay, t block = = 8. By (), p 600, and the nu AU sze s about 5K wods o.m bts, assung fou bytes/wod. Fo a lnea PA wth t block = = 6. By (), p 00, and the nu AU sze s about 8.K wods o 60K bts. Although M bts of fast eoy n AU s at the lt of cuent technology, 60K bts s vey feasble. Fo a lnea PA, as t block = O( ), p s ndependent of. Ths s tue because, fo a lnea PA used fo block pocessng, ts I/O bandwdth s constant, ndependent of the nube of PEs and block sze, and AU sze n blocks (p) depends only on the bandwdth between MM and AU. 5.. Pefoance Metcs The pefoance ndex of a CoP s defned as ts speedup ove a efeence desgn: ef whee T copl Tcopl PefoanceIndex = () cop T cop and T copl ef copl ae the total copleton tes n PEcycles of the efeence desgn and the CoP, espectvely. A CoP wth one PE and an appopate aount of AU eoy to ask MM latency s chosen as the efeence desgn. Thus, both the efeence and cuent desgns have the sae bandwdth ltaton. The AeaIndex of the one-pe efeence desgn n eoy wods can be obtaned fo (9) and () when =, t block =, and #PE =. It s gven by AeaIndex ef 6 = Ape + +. () B BMM A useful tade-off n desgnng a CoP s as follows. Suppose we ncease aea by a cetan facto, how uch wll speedup be nceased (o how uch wll copleton te be educed) f we clock the efeence achne and the CoP at the sae ate? The speedup ove the efeence desgn can be ntepeted as the educton n clock ate n ode to obtan the sae thoughput (o copleton te). In othe wods, we have T copl MM Clockef T = T Clock ef f = Clock Clock T PA ef PA ef copl () ef whee T copl and Clock ef ae the copleton te and clock ate of the efeence desgn, espectvely. Sectons 5. and 5.5 pesent cost-pefoance tade-offs of CoPs whee cost s easued as AeaIndex defned n (9), and pefoance s easued as speedup ove the one-pe efeence desgn (()). A educed clock ate s desable fo seveal easons. Fst, and ost potant, the yeld would be sgnfcantly hghe f the chp wee desgned fo a lowe clock ate. Also, powe dsspaton s lowe at lowe clock ates, leadng to lowe packagng and ntegaton costs. 5. Relatonshp between Pefoance and Aea Indces Fo a D MMG, f the sze of PA s nceased by a facto q, then block sze wll have to be nceased by q fo snglepass executon of each block by PA. Fo a lnea PA, t block = O( ), and the nube of blocks n AU (p) s ndependent of block sze (()). Theefoe, the aea of AU fo a lnea PA nceases by q when s nceased by q ((9)). The total aea (AeaIndex) of the CoP, whch s donated by the aea of AU, nceases by a facto {q} when s nceased by q. Hence, AeaIndex gows as the squae of the nube of PEs n a lnea PA. The followng aguent shows that AU sze has to gow at least as the squae of the nube of PEs n ode to ask MM latency when pocessng a cubcal block of DG on a lnea PA. Consde a cubcal block of DG to be pocessed n a lnea PA. Fo a lnea PA to have constant I/O bandwdth, the te to pocess a block s W( ), as thee ae O( ) nput and output eleents to be loaded nto PA. Theefoe, AU sze has to be W( ) as all the O( ) eleents needed to pocess a block have to be held n AU to ask the MM latency. The nube of PEs n PA s O(), as thee ae opeatons to be copleted n O( ) te. Theefoe, to ask MM latency, AU sze (gven by W( )) gows at least as the squae of the nube of PEs (gven by O()). The copleton te T copl can, at best, decease by q when the nube of PEs s nceased by q (supelnea speedups ae not possble fo detenstc pocessng). The aea ndex gows at least as q when the nube of PEs s nceased by q. Thus, clock-ate educton can gow at best as the squae oot of AeaIndex when the effect of MM latency s asked copletely. Howeve, beyond a cetan nube of PEs (o AeaIndex) fo a fxed poble sze, the copleton te s bounded by the fxed MM bandwdth and s equal to the te to ead and wte the eleents of nput and output atces. Hence, speedup wll flatten out beyond a cetan AeaIndex (A ct ) when MM becoes a bottleneck. That s, ef Tcopl Speedup = = cop T copl R S T d O AeaIndex fo AeaIndex A Volue N fo AeaIndex > A B MM ct a f (5) whee Volue(N) s the total aount of data n the D MMG to be accessed (ncludng both eads and wtes) to pocess the gven n-densonal unfo dependence algoth. In shot, f the copleton te of the efeence PA s fxed, then we have the followng nvese squae-oot elatonshp between the copleton te of PA and ts AeaIndex fo fxed MM bandwdth and specfed clock ate n pocessng a D MMG. PA copl T µ, fo AeaIndexPA Act (6) AeaIndex PA ct J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

13 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS 5. Aea Allocaton between AU and PA Fg. 9 shows the vaaton of the facton of total aea occuped by PA as we ncease the total aea of the CoP. The AU aea n (9) s coputed usng p fo (), whch s chosen to ovelap eoy fetches copletely wth coputatons. Note that the plot s dependent on p and AeaIndex but not on the applcaton poble and ts poble sze. The x-axs s the aea of the chp n eoy wods. Thus, AeaIndex of fve egawods coesponds to an aea equvalent to 60 egabts of stoage, assung each wod s bts. The cost of a PE n eoy wods s denoted as A pe. Hence, fo fve egawods of total aea and A pe = 00, only fou pecent of the total aea s occuped by PEs fo a lnea PA confguaton. Ths shows that ost of the chp aea s taken up by AU f we desgn the chp wth the optal balance whee eoy latency s fully asked. Moeove, fo the sae total aea, a lnea PA has oe of ts aea devoted to PEs than a PA n the fo of a squae esh. The effect of nceased aea of a PE s to lft the ente plot upwads fo both lnea and squae PAs. Although ths fgue s fo B MM = /5 (fve cycles to access a wod fo eoy), the sae effect s obseved fo othe bandwdths. Fg. 9. Aea allocaton between AU and PA of a CoP fo askng MM latency fully. The aea of the one-pe efeence desgn s equal to 0 wods fo A pe = 00, and,0 wods fo A pe =,000 (()). 5. Cost-Pefoance Tade-Offs: Matx Poduct Fg. 0a shows the cost-pefoance tade-offs of CoPs fo coputng atx poducts descbed by D MMGs. Pefoance s easued n speedup (o clock ate), and cost s easued as AeaIndex. The syste s desgned at the balance pont to ask MM latency fully (()). Tade-offs ae shown fo poble szes of N = 5 and N =,0 when the latency to access a wod fo MM s fve cycles. The aea cost of a PE n eoy wods s denoted as A pe. The aount of local eoy n each PE s contolled by paaete b. In ou appoach, fo a gven block sze, we obtan a vtual aay (lnea o squae), and cluste b vtual PEs to obtan nceased local eoy pe PE and educed nube of PEs. Hence, the lage s the value of b, the lowe ae the nube of PEs and clock-ate educton. Fo b =, each physcal (and vtual) PE has thee wods of stoage, one fo A, B, and C, espectvely. Fg. 0a shows that fo about. 0 6 wods of slcon aea and A pe = 00, we can get a speedup of about 5 ove the one-pe efeence desgn fo a squae PA and 8 fo a lnea PA n coputng a,0-by-,0 atx poduct. In othe wods, f we clock a CoP wth a lnea PA at 00 KHz, we wll obtan pefoance equvalent to that of a one-pe efeence desgn unnng at 9. MHz. If we can clock ths CoP at MHz, then we wll obtan.5 tes speedup as copaed to the one-pe efeence desgn. The fnal speedup can be chosen fo a vaety of altenatves dependng on the objectve of the desgn. Fg. 0a futhe shows the squae-oot elatonshp between speedup and AeaIndex (Secton 5.). Fo nstance, to copute a,0-by-,0 atx poduct usng a lnea PA wth A pe = 5,000, the speedup only doubles fo 0 to 0 when AeaIndex s nceased fo 0.5 egawods to.5 egawods. The degadaton n pefoance n usng a lnea PA athe than a squae aay when AeaIndex s about. egawods s sx pecent fo A pe = 00 and 6 pecent fo A pe = 5,000. Fg. 0a also shows that pefoance of a CoP satuates because of the MM bottleneck wth lted bandwdth. Fo salle pobles, speedup wll flatten out eale. Afte the pefoance satuates, the copleton te of the CoP s equal to the te to access nput atces A and B fo MM and to wte output atx C to MM. Hence, the axu speedup (o clock-ate educton) s gven by ef Tcop cop Tcop Clockef Clockcop N = = = 5. fo N =,0. Ths axu N B MM value of 5. coesponds to AeaIndex. egawods (A ct =. egawods) when usng a lnea PA and A pe = 00. The satuaton ponts pove as poducts of lage atces ae coputed. Ou eale wok [], [9] shows that speedup does not satuate n the ange evaluated n Fg. 0 when N = 0,0. The esults n Fg. 0a show that a lnea PA s an attactve choce fo coputng atx poducts and othe UREs. Its advantages ae ts constant I/O bandwdth and odulaly expandable layout. It acheves good pefoance because ou appng algoth can explot effectvely localty n the thee-densonal loops. Fg. 0b depcts the pefoance fo lnea and squae PAs wth b = 8. Hee, eght vtual PEs ae coalesced nto a physcal PE leadng to an eght-fold ncease n local eoy of a PE. Whle the coespondng cuves fo squae PAs ean unchanged fo Fg. 0a, the cuves fo lnea PAs ae lowe than the coespondng ones n Fg. 0a. Agan, speedup gows as the squae oot of AeaIndex when MM latency s asked. Fg. 0c shows the effect of nceasng B MM between MM and AU. The fgue shows that MM bandwdth s a key facto that nfluences the pefoance of CoPs. Fo nstance, when AeaIndex s fve egawods, pefoance doubles fo 5. to 8 when B MM s nceased fo /5 (fve cycles/wod) to / (two cycles/wod). Fnally, Fg. 0d shows the senstvty of speedup to vaatons n AU sze. Assue that AU sze s scaled by a J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

14 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST 997 (a) Cost-pefoance tade-offs. The values of axu speedup ae: SU = SU = SU = SU = 5., SU = SU = 5.6. The ax ax ax ax 5 6 values of ctcal AeaIndex n egawods beyond whch speedup satuates ae: A ct 6 = 0.68, A ct 5 A ct =., A ct =.. ax =.70, A ct ax =., A ct =.97, (c) Effect of bandwdth (B MM ) on cost-pefoance tade-offs. The values of axu speedup fo plots,, ae: SU ax SU ax = 5.6. The values of A ct =.80, A ct = 8, SU ax =.7, A ct = 5., =.0. Plot does not satuate because the MM bandwdth s hghe than the data ate equed by PA. (b) Effect of b (local eoy pe PE) on cost-pefoance tade-offs. Local eoy n a PE s used to sulate a set of vtual PEs and to educe the PE-count. The values of axu speedup ae: SU = SU = 5., SU = SU =.9, SU = SU =.7. ax ax ax ax 5 6 The values of ctcal AeaIndex n egawods ae: A ct 5 A ct =.76, A ct =., A ct =.75, A ct ax 6 =.07, A ct ax =.6, =.. (d) Senstvty of cost-pefoance tade-offs fo the atx-poduct poble when the optal AU sze s ultpled by a. If a <, all the PEs ae dle fo soe te due to nsuffcent data accessed fo MM. If a >, chp aea s wasted by the exta eoy n AU. The axu speedup fo all cuves s 5.. The values of ctcal AeaIndex n egawods ae: A ct = A =.7, A ct ct =., A ct 5 = 0., A ct = 0.6. Fg. 0. Evaluaton of CoPs fo coputng atx poducts. facto of a fo the optal value gven n (). If a <, all PEs wll be dle between the te the cuent set of p blocks ae copleted to the stat of the next set of p blocks. Theefoe, speedup (o clock-ate educton) ((5)) wll decease as a s deceased. If a >, t eans that less aea wll be allocated to PA fo a fxed chp aea. Snce PA s aleady wokng at the axu effcency, the addtonal aea allocated to AU cannot help boost the pefoance of the CoP, esultng n a pefoance cuve that s below the pefoance cuve when a =. When a > and AU sze s lage than the total nube of wods n the nput atces, the exta aea allocated to AU beyond that needed by the nput atces wll be unused. In ths case, we set AU sze to the axu sze needed by the nput atces. Ths explans why the ctcal AeaIndex A ct n Fg. 0d s the sae fo all a, and why J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM / 7

15 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS 5 (a) Cost-pefoance tade-offs. The values of axu speedup ae: SU = SU = SU =.8, SU = SU = SU = 7.7. The ax ax ax 5 6 ax ax ax values of ctcal AeaIndex n egawods beyond whch speedup satuates ae: A ct 6 = 0.6, A ct 5 A ct =., A ct =.9. =.7, A ct.6 =, A ct = 0.7, (c) Effect of bandwdth (B MM ) on cost-pefoance tade-offs. The values of axu speedup fo plots,, ae: SU ax SU ax A ct = 86, SU ax =.8, = 7.7. The values of ctcal AeaIndex n egawods ae: =.9, A ct =.89, A ct bandwdth s hghe than the data ate equed by PA. =.8. Plot does not satuate as MM (b) Effect of b (local eoy pe PE) on cost-pefoance tade-offs. The values of axu speedup ae: SU = SU = SU =.6, 5 6 ax ax ax SU = SU = SU = 6.. The values of ctcal AeaIndex n ax ax ax egawods ae: A ct 6 =., A ct 5 A ct = 6.5. =.0, A ct =., A ct = 5.6, A ct =., (d) Senstvty of cost-pefoance tade-offs fo the tanstve closue poble when the optal AU sze s ultpled by a. The axu speedup fo all cuves s.8. The values of ctcal AeaIndex n egawods ae: 5 A ct = A ct =.88, A ct =.5, A ct =.0, A ct =.. Fg.. Evaluaton of CoPs fo coputng tanstve closues. the pefoance cuve fo a > eges wth the optal pefoance cuve (a = ) at A ct. Fg. 0d futhe shows that fo A pe =,000, AeaIndex = 5M wods, and a 500-fold decease n AU sze (a = 0.00), speedup dops fo 5. to 6. Ths show that lnea-pa desgns show good eslence to changes n AU sze fo the desed optal confguatons. The low senstvty to AU sze can be used to obtan sgnfcant aea savngs wthout lage sacfces n pefoance. Fo nstance, when a = 0. and AeaIndex <.7M wods, AU s one-thd salle than the case when a =.0, esultng n 58 pecent aea savngs (fo Fgs. 9 and 0d) fo only eght pecent decease n pefoance (whee speedup s educed fo 5 to 7). Ths eslence allows desgnes to fne tune the desgns n ode to obtan dffeent aea-pefoance tade-offs. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 5 / 7

16 6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 8, AUGUST Cost-Pefoance Tade-Offs: Tanstve Closue Fg. shows the cost-pefoance tade-offs n applyng CoPs to solve tanstve-closue pobles descbed by D MMGs. The x-axs s the cost o AeaIndex n eoy wods, and the y-axs s the speedup ove the one-pe efeence desgn. Due to the oe egula natue of the tanstve-closue algoth, the pefoance of CoPs n coputng tanstve closues s lowe than that n atx poducts. Fo nstance, Fg. a shows a speedup of only.8 ove the one-pe efeence desgn (as copaed to 5. fo the atx-poduct poble) usng a lnea PA wth fve egawods of slcon aea, A pe =,000, and N =,0. Fo the sae confguaton, Fg. b shows that nceased local eoy n each PE (b = 8) esults n scalng the pefoance down to.6. Lkewse, Fg. c shows poved pefoance when B MM s nceased. Fnally, Fg. d establshes that cost-pefoance tade-offs ae not vey senstve to AU sze equed fo askng MM latency fully. Fo nstance, when a = 0. and AeaIndex =.86, AU s one-thd salle than when a =.0, esultng n a 58 pecent aea savngs fo a sx pecent decease n pefoance (whee speedup s educed fo.8 to.7). As n the case of coputng atx poducts, the satuaton ponts pove when lage atces ae nvolved. Ou eale wok [], [9] shows that speedup does not satuate n the ange evaluated n Fg. when N = 0,0. nvese squae oot of cost. Hee, cost s easued as slcon aea, and pefoance, as copleton te unde fxed an-eoy bandwdth and constant clock speed ((6)). Anothe ntepetaton of pefoance s n the educton of clock speed, whee nceases n AeaIndex n (5) can be consdeed as educton n clock ate fo a gven copleton te. We have deonstated the nvese squae-oot elaton usng the atx-poduct and tanstve-closue applcatons. In addton, we have evaluated CoPs of vaous confguatons and have studed the effects of dffeent paaetes on the nvese squae-oot elatonshp. Although ou esults have been deved fo theedensonal ultesh dependence gaphs, they can be extended to ultesh gaphs of hghe densons. Gven a lnea pocesso aay, pefoance wll be elated to the nvese th oot of AeaIndex whee + s the denson of the ultesh gaphs. Thee ae two aeas that we wll study n the futue. ) The effectveness of unfozng non-unfo ecuences nto ultesh gaphs. We wll need to evaluate altenatve ultesh dependence-gaph epesentatons of a gven dependence gaph. ) The desgn of a hadwae contolle and the assocated softwae cople to suppot the pattonng and appng pocesses. 6 CONCLUSIONS Ths pape descbes the desgn of a copocesso (CoP) fo executng loop coputatons descbed by unfo dependence algoths. Ou esults show that hgh pefoance s acheved by a odulaly expandable lnea aay of PEs, coupled wth local buffe eoy that ntefaces a Man Meoy (MM) wth fxed bandwdth. The hgh pefoance s acheved by effcent ethods fo appng ecuences on CoPs. Thee ae thee ajo contbutons n ths pape. Fo the class of ultesh dependence gaphs, we have developed a pattonng schee that we have poved to be asyptotcally optal. Othe exstng schees have been developed fo oe geneal dependence gaphs and ay be suboptal wth espect to ultesh gaphs. It s potant to pont out that optalty s only asyptotc; that s, fo a specfc poble nstance and Access-Unt (AU) sze, t s possble that ou pattonng schee ay nduce oe eoy taffc than an exstng schee. In coputng atx poducts and tanstve closues, we have found the optal dvson of chp aea between Pocesso Aay (PA) and AU fo gven chp sze, MM bandwdth, and tlng ethod. We have also shown aea-speedup tade-offs when optal pattonng of chp aea s not done. Such tade-offs ae potant when cobnng ultple CoP chps togethe to fo a lage CoP syste. Fo dependence gaphs epesented as theedensonal ultesh gaphs (MMGs) and pocesso aays n a lnea confguaton, ou study fnds an potant elaton that pefoance s elated to the APPENDIX Poof of Lea A gven pattonng atx P s vald f the copessed o block-level DG s acyclc. Hence, gven a pattonng vecto p, all of the dependence vectos d j should coss the hypeplane coespondng to p n the sae decton;.e., t " j p dj 0 o p t dj < 0. Fg. shows the case n whch p s not vald (Block depends on Block and vce vesa). Poof of Coollay Consde colun p of pattonng atx P = D, whee D s an othonoal atx. Hence, p d d d d t t = = > 0, whee d s the agntude of d. By Lea, P s vald. Also, p d t 0, j π ; hence, t satsfes (5). j = Fg.. A pa of cyclcally dependent blocks. Block depends on block though dependence d, whle block depends on block though d. Hence, pattonng vecto p s nvald. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 6 / 7

17 GANAPATHY ET AL.: DESIGNING A SCALABLE PROCESSOR ARRAY FOR RECURRENT COMPUTATIONS 7 ACKNOWLEDGMENTS Reseach suppoted by Jont Sevces Electoncs Poga contact N J-70, U.S. Natonal Scence Foundaton gants MIP and MIP 96-6, and an IBM gaduate fellowshp gant. A pelnay veson contanng about 5 pecent of ths pape appeaed n the Poceedngs of the IEEE Syposu on Paallel and Dstbuted Pocessng, pp , Decebe 99. REFERENCES [] M. Annaatone, E. Anould, T. Goss, H.T. Kung, M. La, O. Menzlcoglu, and J.A. Webb, The Wap Copute: Achtectue, Ipleentaton and Pefoance, IEEE Tans. Coputes, vol. 6, no., pp.,5-,58, Dec [] J. Bu and E.F. Depettee, Pocesso Clusteng fo the Desgn of Optal Fxed-Sze Systolc Aays, Poc. Applcaton Specfc Aay Pocessos, pp. 0-, IEEE CS Pess, Sept. 99. [] W.P. Buleston, Pattonng Poble on VLSI Aays: I/O and Local Meoy Coplexty, Poc. of ICASSP, pp.,7-,0, Toonto, Canada, May 99. [] S. Ca and K. Kennedy, Cople Blockablty of Nuecal Algoths, Poc. Int l Conf. Supecoputng, pp. -, 99. [5] V.V. Dongen, Mappng Unfo Recuence onto Sall Sze Aays, Poc. PARLE, pp. 9-08, 99. [6] B.L. Dake, F.T. Luk, J.M. Spese, and J.J. Syansk, SLAPP: A Systolc Lnea Algeba Copute, Copute, vol. 0, no. 7, p. 5, July 987. [7] J.A.B. Fotes, B.W. Wah, W. Shang, and K.N. Ganapathy, Algoth-Specfc Paallel Pocessng wth Lnea Pocesso Aays, Advances n Coputes, M. Yovts, ed. Acadec Pess, 99. [8] D.E. Foule and R. Schebe, The Saxpy Matx-: A Geneal Pupose Systolc Copute, Copute, vol. 0, no. 7, p. 5, July 987. [9] K. Ganapathy and B.W. Wah, Optal Synthess of Algoth- Specfc Lowe-Densonal Pocesso Aays, IEEE Tans. Paallel and Dstbuted Systes, vol. 7, no., pp. 7-87, Ma [0] K. Ganapathy, Mappng Regula Recusve Algoths to Fne- Ganed Pocesso Aays, PhD thess, Unv. of Illnos, Ubana- Chapagn, May 99. [] K. Ganapathy and B.W. Wah, Optzng Geneal Desgn Objectves n Pocesso Aay Desgn (extended pape), Poc. IEEE Int l Paallel Pocessng Syp., pp. 95-0, Ap. 99. [] K.N. Ganapathy and B.W. Wah, Syntheszng Optal Lowe Densonal Pocesso Aays, Poc. Int l Conf. Paallel Pocessng, pp. 96-0, Pennsylvana State Unv. Pess, Aug. 99. [] J.-W. Hong and H.T. Kung, The I/O Coplexty: The Red Blue Pebble Gae, Poc. th Ann. ACM Syp. Theoy of Coputng, pp. 6-, May 98. [] F. Igon and R. Tolet, Supenode Pattonng, Poc. 5th Ann. ACM SIGACT-SIGPLAN Syp. Pncples of Pogang Languages, pp. 9-9, Jan [5] K. Janandunsng, Optal Pattonng Schees fo Wavefont/Systolc Aay Pocessos, techncal epot, Delft Unv. of Technology, Delft, The Nethelands, Ap [6] R.M. Kap, R.E. Mlle, and S. Wnogad, The Oganzaton of Coputatons fo Unfo Recuences, J. ACM, vol., pp , July 967. [7] P. Kuchbhotla and B.D. Rao, Effcent Schedulng Methods fo Pattoned Systolc Algoths, Poc. Applcaton Specfc Aay Pocessos, pp , IEEE CS Pess, Aug. 99. [8] D. Kulkan, K. Kua, A. Basu, and A. Paulaj, Loop Pattonng fo Dstbuted Meoy Multpocessos as Unodula Tansfoatons, Poc. Int l Conf. Supecoputng, pp. 06-5, 99. [9] S.Y. Kung, VLSI Pocesso Aays. Englewood Clffs, N.J.: Pentce Hall, 988. [0] D. Le, M. Ecegovac, T. Lang, and J. Moeno, MAMACG: A Tool fo Mappng Matx Algoths on to Mesh Connected Pocesso Aays, Poc. Applcaton Specfc Aay Pocessos, pp. 5-55, Aug. 99. [] D.I. Moldovan and J.A.B. Fotes, Pattonng and Mappng Algoths nto Fxed Sze Systolc Aays, IEEE Tans. Coputes, vol. 5, no., pp. -, Jan [] J.H. Moeno and T. Lang, Matx Coputatons on Systolc-Type Meshes: An Intoducton to Mult-Mesh Gaph (MMG) Method, Copute, vol., no., p., Ap [] J.H. Moeno, Matx Coputatons on Mesh Aays, PhD thess, Unv. of Calfona, Los Angeles, June 989. [] J.H. Moeno and M.E. Fgueoa, A Decoupled Access/Execute Pocesso fo Matx Algoths: Achtectue and Pogang, Poc. Applcaton Specfc Aay Pocessos, pp. 8-95, IEEE CS Pess, 99. [5] J.J. Navao, J.M. Llabea, and M. Valeo, Pattonng: An Essental Step n Mappng Algoths nto Systolc Aay Pocessos, Copute, vol. 0, no. 7, pp , July 987. [6] J.K. Pe and R. Cyton, Mnu Dstance: A Method fo Pattonng Recuences fo Multpocessos, Poc. Int l Conf. Paallel Pocessng, pp. 7-5, 987. [7] K.W. Pzytula, Medu Gan Paallel Achtectue fo Iage and Sgnal Pocessng, Paallel Achtectues and Algoths fo Iage Undestandng, V.K.P. Kua, ed., pp Acadec Pess, 99. [8] W. Shang and J.A.B. Fotes, On Mappng of Unfo Dependence Algoths nto Lowe Densonal Pocesso Aays, IEEE Tans. Paallel and Dstbuted Systes, vol., no. 5, pp. 50-6, May 99. [9] A. Suaez, J.M. Llabea, and A. Fenandez, Schedulng Pattons n Systolc Algoths, Poc. Applcaton Specfc Aay Pocessos, pp. 69-6, IEEE CS Pess, Aug. 99. [0] J. Syansk and K. Boley, Vdeo Analyss Tanspute Aay (VATA) Pocesso, Poc. SPIE Real-Te Sgnal Pocessng XI, Aug [] M.E. Wolf and M.S. La, A Data Localty Optzng Algoth, Poc. ACM SIGPLAN Conf. Pogang Language Desgn and Ipleentaton, pp. 0-, 99. [] X. Zhong and S. Rajopadhye, Devng Fully Effcent Systolc Aays by Quas-Lnea Allocaton Functons, Poc. PARLE, pp. 9-5, 99. Kua Ganapathy eceved hs BTech degee fo the Indan Insttute of Technology, Madas, Inda, hs MS n electcal engneeng fo the Unvesty of Massachusetts at Ahest n 990, and hs PhD n electcal and copute engneeng fo the Unvesty of Illnos at Ubana-Chapagn n 99. Snce 99, he has been wth Rockwell Seconducto Systes, whee he has played a key pat n developng hgh-pefoance genealpupose DSP pocessos fo hgh-speed councaton and weless applcatons. Hs eseach nteests nclude DSP achtectues, coples, specal-pupose and systolc achtectues, pocesso valdaton, and low-powe desgn technques. Benjan W. Wah eceved hs PhD degee n copute scence fo the Unvesty of Calfona at Bekeley n 979. He s cuently a pofesso n the Depatent of Electcal and Copute Engneeng and the Coodnated Scence Laboatoy of the Unvesty of Illnos at Ubana-Chapagn. He pevously seved on the faculty of Pudue Unvesty ( ), as a poga decto at the Natonal Scence Foundaton ( ), as Fujtsu Vstng Cha Pofesso of Intellgence Engneeng at the Unvesty of Tokyo (99), and as McKay Vstng Pofesso of Electcal Engneeng and Copute Scence at the Unvesty of Calfona at Bekeley (99). In 989, he was naed a Unvesty Schola of the Unvesty of Illnos. Hs cuent eseach nteests ae n the aeas of paallel and dstbuted pocessng, knowledge engneeng, and optzaton. D. Wah was edto-n-chef of the IEEE Tansactons on Knowledge and Data Engneeng fo , and seves on the edtoal boads of Infoaton Scences, Intenatonal Jounal on Atfcal Intellgence Tools, and Jounal of VLSI Sgnal Pocessng. He has chaed a nube of ntenatonal confeences and s cuently sevng the IEEE Copute Socety as ts teasue and a ebe of ts govenng Boad. He s the cha of the 997 IEEE- CS Fellow Evaluaton Cottee. He s a fellow of the IEEE. Chen-We L gaduated fo Natonal Tawan Unvesty (BS, 990, MS, 99; both n copute scence) and s cuently a PhD student n the Copute Scence Depatent and a eseach assstant at the Coodnated Scence Laboatoy at the Unvesty of Illnos at Ubana- Chapagn. Hs eseach nteests ae n copute achtectue and coples. J:\PRODUCTION\TPDS\-INPROD\0096\0096_.DOC egulapape97.dot KSM 9,968 0//97 8:6 AM 7 / 7

THE EQUIVALENCE OF GRAM-SCHMIDT AND QR FACTORIZATION (page 227) Gram-Schmidt provides another way to compute a QR decomposition: n

THE EQUIVALENCE OF GRAM-SCHMIDT AND QR FACTORIZATION (page 227) Gram-Schmidt provides another way to compute a QR decomposition: n HE EQUIVAENCE OF GRA-SCHID AND QR FACORIZAION (page 7 Ga-Schdt podes anothe way to copute a QR decoposton: n gen ectos,, K, R, Ga-Schdt detenes scalas j such that o + + + [ ] [ ] hs s a QR factozaton of

More information

GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS

GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS #A39 INTEGERS 9 (009), 497-513 GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS Mohaad Faokh D. G. Depatent of Matheatcs, Fedows Unvesty of Mashhad, Mashhad,

More information

A. Proofs for learning guarantees

A. Proofs for learning guarantees Leanng Theoy and Algoths fo Revenue Optzaton n Second-Pce Auctons wth Reseve A. Poofs fo leanng guaantees A.. Revenue foula The sple expesson of the expected evenue (2) can be obtaned as follows: E b Revenue(,

More information

Chapter 8. Linear Momentum, Impulse, and Collisions

Chapter 8. Linear Momentum, Impulse, and Collisions Chapte 8 Lnea oentu, Ipulse, and Collsons 8. Lnea oentu and Ipulse The lnea oentu p of a patcle of ass ovng wth velocty v s defned as: p " v ote that p s a vecto that ponts n the sae decton as the velocty

More information

gravity r2,1 r2 r1 by m 2,1

gravity r2,1 r2 r1 by m 2,1 Gavtaton Many of the foundatons of classcal echancs wee fst dscoveed when phlosophes (ealy scentsts and atheatcans) ted to explan the oton of planets and stas. Newton s ost faous fo unfyng the oton of

More information

2/24/2014. The point mass. Impulse for a single collision The impulse of a force is a vector. The Center of Mass. System of particles

2/24/2014. The point mass. Impulse for a single collision The impulse of a force is a vector. The Center of Mass. System of particles /4/04 Chapte 7 Lnea oentu Lnea oentu of a Sngle Patcle Lnea oentu: p υ It s a easue of the patcle s oton It s a vecto, sla to the veloct p υ p υ p υ z z p It also depends on the ass of the object, sla

More information

Set of square-integrable function 2 L : function space F

Set of square-integrable function 2 L : function space F Set of squae-ntegable functon L : functon space F Motvaton: In ou pevous dscussons we have seen that fo fee patcles wave equatons (Helmholt o Schödnge) can be expessed n tems of egenvalue equatons. H E,

More information

Thermoelastic Problem of a Long Annular Multilayered Cylinder

Thermoelastic Problem of a Long Annular Multilayered Cylinder Wold Jounal of Mechancs, 3, 3, 6- http://dx.do.og/.436/w.3.35a Publshed Onlne August 3 (http://www.scp.og/ounal/w) Theoelastc Poble of a Long Annula Multlayeed Cylnde Y Hsen Wu *, Kuo-Chang Jane Depatent

More information

arxiv: v2 [cs.it] 11 Jul 2014

arxiv: v2 [cs.it] 11 Jul 2014 A faly of optal locally ecoveable codes Itzhak Tao, Mebe, IEEE, and Alexande Bag, Fellow, IEEE axv:1311.3284v2 [cs.it] 11 Jul 2014 Abstact A code ove a fnte alphabet s called locally ecoveable (LRC) f

More information

Multistage Median Ranked Set Sampling for Estimating the Population Median

Multistage Median Ranked Set Sampling for Estimating the Population Median Jounal of Mathematcs and Statstcs 3 (: 58-64 007 ISSN 549-3644 007 Scence Publcatons Multstage Medan Ranked Set Samplng fo Estmatng the Populaton Medan Abdul Azz Jeman Ame Al-Oma and Kamaulzaman Ibahm

More information

Rigid Bodies: Equivalent Systems of Forces

Rigid Bodies: Equivalent Systems of Forces Engneeng Statcs, ENGR 2301 Chapte 3 Rgd Bodes: Equvalent Sstems of oces Intoducton Teatment of a bod as a sngle patcle s not alwas possble. In geneal, the se of the bod and the specfc ponts of applcaton

More information

UNIT10 PLANE OF REGRESSION

UNIT10 PLANE OF REGRESSION UIT0 PLAE OF REGRESSIO Plane of Regesson Stuctue 0. Intoducton Ojectves 0. Yule s otaton 0. Plane of Regesson fo thee Vaales 0.4 Popetes of Resduals 0.5 Vaance of the Resduals 0.6 Summay 0.7 Solutons /

More information

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c nd Intenatonal Confeence on Electcal Compute Engneeng and Electoncs (ICECEE 15) Dstnct 8-QAM+ Pefect Aays Fanxn Zeng 1 a Zhenyu Zhang 1 b Lnje Qan 1 c 1 Chongqng Key Laboatoy of Emegency Communcaton Chongqng

More information

Energy in Closed Systems

Energy in Closed Systems Enegy n Closed Systems Anamta Palt palt.anamta@gmal.com Abstact The wtng ndcates a beakdown of the classcal laws. We consde consevaton of enegy wth a many body system n elaton to the nvese squae law and

More information

8 Baire Category Theorem and Uniform Boundedness

8 Baire Category Theorem and Uniform Boundedness 8 Bae Categoy Theoem and Unfom Boundedness Pncple 8.1 Bae s Categoy Theoem Valdty of many esults n analyss depends on the completeness popety. Ths popety addesses the nadequacy of the system of atonal

More information

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints.

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints. Mathematcal Foundatons -1- Constaned Optmzaton Constaned Optmzaton Ma{ f ( ) X} whee X {, h ( ), 1,, m} Necessay condtons fo to be a soluton to ths mamzaton poblem Mathematcally, f ag Ma{ f ( ) X}, then

More information

Optimal Design of Step Stress Partially Accelerated Life Test under Progressive Type-II Censored Data with Random Removal for Gompertz Distribution

Optimal Design of Step Stress Partially Accelerated Life Test under Progressive Type-II Censored Data with Random Removal for Gompertz Distribution Aecan Jounal of Appled Matheatcs and Statstcs, 09, Vol 7, No, 37-4 Avalable onlne at http://pubsscepubco/ajas/7//6 Scence and Educaton Publshng DOI:069/ajas-7--6 Optal Desgn of Step Stess Patally Acceleated

More information

Scalars and Vectors Scalar

Scalars and Vectors Scalar Scalas and ectos Scala A phscal quantt that s completel chaacteed b a eal numbe (o b ts numecal value) s called a scala. In othe wods a scala possesses onl a magntude. Mass denst volume tempeatue tme eneg

More information

24-2: Electric Potential Energy. 24-1: What is physics

24-2: Electric Potential Energy. 24-1: What is physics D. Iyad SAADEDDIN Chapte 4: Electc Potental Electc potental Enegy and Electc potental Calculatng the E-potental fom E-feld fo dffeent chage dstbutons Calculatng the E-feld fom E-potental Potental of a

More information

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle 1 PHYS 705: Classcal Mechancs Devaton of Lagange Equatons fom D Alembet s Pncple 2 D Alembet s Pncple Followng a smla agument fo the vtual dsplacement to be consstent wth constants,.e, (no vtual wok fo

More information

Physics 207 Lecture 16

Physics 207 Lecture 16 Physcs 07 Lectue 6 Goals: Lectue 6 Chapte Extend the patcle odel to gd-bodes Undestand the equlbu of an extended object. Analyze ollng oton Undestand otaton about a fxed axs. Eploy consevaton of angula

More information

3. A Review of Some Existing AW (BT, CT) Algorithms

3. A Review of Some Existing AW (BT, CT) Algorithms 3. A Revew of Some Exstng AW (BT, CT) Algothms In ths secton, some typcal ant-wndp algothms wll be descbed. As the soltons fo bmpless and condtoned tansfe ae smla to those fo ant-wndp, the pesented algothms

More information

Physics 1501 Lecture 19

Physics 1501 Lecture 19 Physcs 1501 ectue 19 Physcs 1501: ectue 19 Today s Agenda Announceents HW#7: due Oct. 1 Mdte 1: aveage 45 % Topcs otatonal Kneatcs otatonal Enegy Moents of Ineta Physcs 1501: ectue 19, Pg 1 Suay (wth copason

More information

VLSI IMPLEMENTATION OF PARALLEL- SERIAL LMS ADAPTIVE FILTERS

VLSI IMPLEMENTATION OF PARALLEL- SERIAL LMS ADAPTIVE FILTERS VLSI IMPLEMENTATION OF PARALLEL- SERIAL LMS ADAPTIVE FILTERS Run-Bo Fu, Paul Fotie Dept. of Electical and Copute Engineeing, Laval Univesity Québec, Québec, Canada GK 7P4 eail: fotie@gel.ulaval.ca Abstact

More information

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences Geneatng Functons, Weghted and Non-Weghted Sums fo Powes of Second-Ode Recuence Sequences Pantelmon Stăncă Aubun Unvesty Montgomey, Depatment of Mathematcs Montgomey, AL 3614-403, USA e-mal: stanca@studel.aum.edu

More information

Chapter I Matrices, Vectors, & Vector Calculus 1-1, 1-9, 1-10, 1-11, 1-17, 1-18, 1-25, 1-27, 1-36, 1-37, 1-41.

Chapter I Matrices, Vectors, & Vector Calculus 1-1, 1-9, 1-10, 1-11, 1-17, 1-18, 1-25, 1-27, 1-36, 1-37, 1-41. Chapte I Matces, Vectos, & Vecto Calculus -, -9, -0, -, -7, -8, -5, -7, -36, -37, -4. . Concept of a Scala Consde the aa of patcles shown n the fgue. he mass of the patcle at (,) can be epessed as. M (,

More information

Correspondence Analysis & Related Methods

Correspondence Analysis & Related Methods Coespondence Analyss & Related Methods Ineta contbutons n weghted PCA PCA s a method of data vsualzaton whch epesents the tue postons of ponts n a map whch comes closest to all the ponts, closest n sense

More information

Engineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems

Engineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems Engneeng echancs oce esultants, Toques, Scala oducts, Equvalent oce sstems Tata cgaw-hll Companes, 008 Resultant of Two oces foce: acton of one bod on anothe; chaacteed b ts pont of applcaton, magntude,

More information

Khintchine-Type Inequalities and Their Applications in Optimization

Khintchine-Type Inequalities and Their Applications in Optimization Khntchne-Type Inequaltes and The Applcatons n Optmzaton Anthony Man-Cho So Depatment of Systems Engneeng & Engneeng Management The Chnese Unvesty of Hong Kong ISDS-Kolloquum Unvestaet Wen 29 June 2009

More information

P 365. r r r )...(1 365

P 365. r r r )...(1 365 SCIENCE WORLD JOURNAL VOL (NO4) 008 www.scecncewoldounal.og ISSN 597-64 SHORT COMMUNICATION ANALYSING THE APPROXIMATION MODEL TO BIRTHDAY PROBLEM *CHOJI, D.N. & DEME, A.C. Depatment of Mathematcs Unvesty

More information

5-99C The Taylor series expansion of the temperature at a specified nodal point m about time t i is

5-99C The Taylor series expansion of the temperature at a specified nodal point m about time t i is Chapte Nuecal Methods n Heat Conducton Specal opc: Contollng the Nuecal Eo -9C he esults obtaned usng a nuecal ethod dffe fo the eact esults obtaned analytcally because the esults obtaned by a nuecal ethod

More information

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M Dola Bagayoko (0) Integal Vecto Opeatons and elated Theoems Applcatons n Mechancs and E&M Ι Basc Defnton Please efe to you calculus evewed below. Ι, ΙΙ, andιιι notes and textbooks fo detals on the concepts

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

PHYS 1443 Section 003 Lecture #21

PHYS 1443 Section 003 Lecture #21 PHYS 443 Secton 003 Lectue # Wednesday, Nov. 7, 00 D. Jaehoon Yu. Gavtatonal eld. negy n Planetay and Satellte Motons 3. scape Speed 4. lud and Pessue 5. Vaaton of Pessue and Depth 6. Absolute and Relatve

More information

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis Optmzaton Meods: Lnea Pogammng- Revsed Smple Meod Module Lectue Notes Revsed Smple Meod, Dualty and Senstvty analyss Intoducton In e pevous class, e smple meod was dscussed whee e smple tableau at each

More information

Machine Learning 4771

Machine Learning 4771 Machne Leanng 4771 Instucto: Tony Jebaa Topc 6 Revew: Suppot Vecto Machnes Pmal & Dual Soluton Non-sepaable SVMs Kenels SVM Demo Revew: SVM Suppot vecto machnes ae (n the smplest case) lnea classfes that

More information

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering Themodynamcs of solds 4. Statstcal themodynamcs and the 3 d law Kwangheon Pak Kyung Hee Unvesty Depatment of Nuclea Engneeng 4.1. Intoducton to statstcal themodynamcs Classcal themodynamcs Statstcal themodynamcs

More information

Chapter 23: Electric Potential

Chapter 23: Electric Potential Chapte 23: Electc Potental Electc Potental Enegy It tuns out (won t show ths) that the tostatc foce, qq 1 2 F ˆ = k, s consevatve. 2 Recall, fo any consevatve foce, t s always possble to wte the wok done

More information

Fatigue equivalent loads for visualization of multimodal dynamic simulations

Fatigue equivalent loads for visualization of multimodal dynamic simulations Vst the SIMULI Resouce Cente fo oe custoe exaples. atgue equvalent loads fo vsualzaton of ultodal dynac sulatons Henk Wentzel 1, Gwenaëlle Genet 1 Coespondng autho Scana Coecal Vehcles B SE-151 87, Södetälje,

More information

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS Intenatonal Jounal of Mathematcal Engneeng Scence ISSN : 22776982 Volume Issue 4 (Apl 202) http://www.mes.com/ https://stes.google.com/ste/mesounal/ APPLICATIONS OF SEMIGENERALIZED CLOSED SETS G.SHANMUGAM,

More information

Physics 2A Chapter 11 - Universal Gravitation Fall 2017

Physics 2A Chapter 11 - Universal Gravitation Fall 2017 Physcs A Chapte - Unvesal Gavtaton Fall 07 hese notes ae ve pages. A quck summay: he text boxes n the notes contan the esults that wll compse the toolbox o Chapte. hee ae thee sectons: the law o gavtaton,

More information

AN ALGORITHM FOR CALCULATING THE CYCLETIME AND GREENTIMES FOR A SIGNALIZED INTERSECTION

AN ALGORITHM FOR CALCULATING THE CYCLETIME AND GREENTIMES FOR A SIGNALIZED INTERSECTION AN AGORITHM OR CACUATING THE CYCETIME AND GREENTIMES OR A SIGNAIZED INTERSECTION Henk Taale 1. Intoducton o a snalzed ntesecton wth a fedte contol state the cclete and eentes ae the vaables that nfluence

More information

Amplifier Constant Gain and Noise

Amplifier Constant Gain and Noise Amplfe Constant Gan and ose by Manfed Thumm and Wene Wesbeck Foschungszentum Kalsuhe n de Helmholtz - Gemenschaft Unvestät Kalsuhe (TH) Reseach Unvesty founded 85 Ccles of Constant Gan (I) If s taken to

More information

(8) Gain Stage and Simple Output Stage

(8) Gain Stage and Simple Output Stage EEEB23 Electoncs Analyss & Desgn (8) Gan Stage and Smple Output Stage Leanng Outcome Able to: Analyze an example of a gan stage and output stage of a multstage amplfe. efeence: Neamen, Chapte 11 8.0) ntoducton

More information

SMOOTH FLEXIBLE MODELS OF NONHOMOGENEOUS POISSON PROCESSES USING ONE OR MORE PROCESS REALIZATIONS. for all t (0, S]

SMOOTH FLEXIBLE MODELS OF NONHOMOGENEOUS POISSON PROCESSES USING ONE OR MORE PROCESS REALIZATIONS. for all t (0, S] Poceedngs of the 8 Wnte ulaton Confeence J Mason R R Hll L Mönch O Rose T Jeffeson J W Fowle eds MOOTH FLEXIBLE MOEL OF NONHOMOGENEOU POION PROCEE UING ONE OR MORE PROCE REALIZATION Mchael E uhl halaa

More information

General method to derive the relationship between two sets of Zernike coefficients corresponding to different aperture sizes

General method to derive the relationship between two sets of Zernike coefficients corresponding to different aperture sizes 960 J. Opt. Soc. A. A/ Vol. 23, No. 8/ August 2006 Shu et al. Geneal ethod to deve the elatonshp between two sets of Zene coeffcents coespondng to dffeent apetue szes Huazhong Shu and Ln Luo Laboatoy of

More information

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law Physcs 11b Lectue # Electc Feld Electc Flux Gauss s Law What We Dd Last Tme Electc chage = How object esponds to electc foce Comes n postve and negatve flavos Conseved Electc foce Coulomb s Law F Same

More information

Tian Zheng Department of Statistics Columbia University

Tian Zheng Department of Statistics Columbia University Haplotype Tansmsson Assocaton (HTA) An "Impotance" Measue fo Selectng Genetc Makes Tan Zheng Depatment of Statstcs Columba Unvesty Ths s a jont wok wth Pofesso Shaw-Hwa Lo n the Depatment of Statstcs at

More information

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1 Machne Leanng -7/5 7/5-78, 78, Spng 8 Spectal Clusteng Ec Xng Lectue 3, pl 4, 8 Readng: Ec Xng Data Clusteng wo dffeent ctea Compactness, e.g., k-means, mxtue models Connectvty, e.g., spectal clusteng

More information

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner. (C) 998 Gerald B Sheblé, all rghts reserved Lnear Prograng Introducton Contents I. What s LP? II. LP Theor III. The Splex Method IV. Refneents to the Splex Method What s LP? LP s an optzaton technque that

More information

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time Intenatonal Jounal of ompute Applcatons (5 ) Volume 44 No, Apl Optmal System fo Wam Standby omponents n the esence of Standby Swtchng Falues, Two Types of Falues and Geneal Repa Tme Mohamed Salah EL-Shebeny

More information

30 The Electric Field Due to a Continuous Distribution of Charge on a Line

30 The Electric Field Due to a Continuous Distribution of Charge on a Line hapte 0 The Electic Field Due to a ontinuous Distibution of hage on a Line 0 The Electic Field Due to a ontinuous Distibution of hage on a Line Evey integal ust include a diffeential (such as d, dt, dq,

More information

A Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates

A Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates Appled Mathematcal Scences, Vol. 7, 03, no. 5, 67-633 HIKARI Ltd, www.m-hka.com http://dx.do.og/0.988/ams.03.38448 A Study about One-Dmensonal Steady State Heat ansfe n ylndcal and Sphecal oodnates Lesson

More information

4. Linear systems of equations. In matrix form: Given: matrix A and vector b Solve: Ax = b. Sup = least upper bound

4. Linear systems of equations. In matrix form: Given: matrix A and vector b Solve: Ax = b. Sup = least upper bound 4. Lnea systes of eqatons a a a a 3 3 a a a a 3 3 a a a a 3 3 In at fo: a a a3 a a a a3 a a a a3 a Defnton ( vecto no): On a vecto space V, a vecto no s a fncton fo V to e set of non-negatve eal nes at

More information

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation By Rudy A. Gdeon The Unvesty of Montana The Geatest Devaton Coelaton Coeffcent and ts Geometcal Intepetaton The Geatest Devaton Coelaton Coeffcent (GDCC) was ntoduced by Gdeon and Hollste (987). The GDCC

More information

Optimum Settings of Process Mean, Economic Order Quantity, and Commission Fee

Optimum Settings of Process Mean, Economic Order Quantity, and Commission Fee Jounal of Applied Science and Engineeing, Vol. 15, No. 4, pp. 343 352 (2012 343 Optiu Settings of Pocess Mean, Econoic Ode Quantity, and Coission Fee Chung-Ho Chen 1 *, Chao-Yu Chou 2 and Wei-Chen Lee

More information

A. Thicknesses and Densities

A. Thicknesses and Densities 10 Lab0 The Eath s Shells A. Thcknesses and Denstes Any theoy of the nteo of the Eath must be consstent wth the fact that ts aggegate densty s 5.5 g/cm (ecall we calculated ths densty last tme). In othe

More information

An Approach to Inverse Fuzzy Arithmetic

An Approach to Inverse Fuzzy Arithmetic An Appoach to Invese Fuzzy Athmetc Mchael Hanss Insttute A of Mechancs, Unvesty of Stuttgat Stuttgat, Gemany mhanss@mechaun-stuttgatde Abstact A novel appoach of nvese fuzzy athmetc s ntoduced to successfully

More information

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions A Bef Gude to Recognzng and Copng Wth Falues of the Classcal Regesson Assumptons Model: Y 1 k X 1 X fxed n epeated samples IID 0, I. Specfcaton Poblems A. Unnecessay explanatoy vaables 1. OLS s no longe

More information

The Finite Strip Method (FSM) 1. Introduction

The Finite Strip Method (FSM) 1. Introduction The Fnte Stp ethod (FS). ntoducton Ths s the ethod of se-nuecal and se-analtcal natue. t s sutale fo the analss of ectangula plates and plane-stess eleents o stuctues eng the conaton of oth. Theefoe, the

More information

9/12/2013. Microelectronics Circuit Analysis and Design. Modes of Operation. Cross Section of Integrated Circuit npn Transistor

9/12/2013. Microelectronics Circuit Analysis and Design. Modes of Operation. Cross Section of Integrated Circuit npn Transistor Mcoelectoncs Ccut Analyss and Desgn Donald A. Neamen Chapte 5 The pola Juncton Tanssto In ths chapte, we wll: Dscuss the physcal stuctue and opeaton of the bpola juncton tanssto. Undestand the dc analyss

More information

Test 1 phy What mass of a material with density ρ is required to make a hollow spherical shell having inner radius r i and outer radius r o?

Test 1 phy What mass of a material with density ρ is required to make a hollow spherical shell having inner radius r i and outer radius r o? Test 1 phy 0 1. a) What s the pupose of measuement? b) Wte all fou condtons, whch must be satsfed by a scala poduct. (Use dffeent symbols to dstngush opeatons on ectos fom opeatons on numbes.) c) What

More information

q-bernstein polynomials and Bézier curves

q-bernstein polynomials and Bézier curves Jounal of Computatonal and Appled Mathematcs 151 (2003) 1-12 q-bensten polynomals and Béze cuves Hall Ouç a, and Geoge M. Phllps b a Depatment of Mathematcs, Dokuz Eylül Unvesty Fen Edebyat Fakültes, Tınaztepe

More information

Chapter Fifiteen. Surfaces Revisited

Chapter Fifiteen. Surfaces Revisited Chapte Ffteen ufaces Revsted 15.1 Vecto Descpton of ufaces We look now at the vey specal case of functons : D R 3, whee D R s a nce subset of the plane. We suppose s a nce functon. As the pont ( s, t)

More information

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e. SSTEM MODELLIN In order to solve a control syste proble, the descrptons of the syste and ts coponents ust be put nto a for sutable for analyss and evaluaton. The followng ethods can be used to odel physcal

More information

Preference and Demand Examples

Preference and Demand Examples Dvson of the Huantes and Socal Scences Preference and Deand Exaples KC Border October, 2002 Revsed Noveber 206 These notes show how to use the Lagrange Karush Kuhn Tucker ultpler theores to solve the proble

More information

CHAPTER 15 SPECIAL PERTURBATIONS

CHAPTER 15 SPECIAL PERTURBATIONS CHAPTER 5 SPECIAL PERTURBATIONS [Ths chapte s unde developent and t a be a athe long te befoe t s coplete. It s the ntenton that t a deal wth specal petubatons, dffeental coectons, and the coputaton of

More information

CSJM University Class: B.Sc.-II Sub:Physics Paper-II Title: Electromagnetics Unit-1: Electrostatics Lecture: 1 to 4

CSJM University Class: B.Sc.-II Sub:Physics Paper-II Title: Electromagnetics Unit-1: Electrostatics Lecture: 1 to 4 CSJM Unvesty Class: B.Sc.-II Sub:Physcs Pape-II Ttle: Electomagnetcs Unt-: Electostatcs Lectue: to 4 Electostatcs: It deals the study of behavo of statc o statonay Chages. Electc Chage: It s popety by

More information

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork Joned Physcs Analyss Cente Summe Wokshop on the Reacton Theoy Execse sheet 8 Vncent Matheu Contact: http://www.ndana.edu/~sst/ndex.html June June To be dscussed on Tuesday of Week-II. Classwok. Deve all

More information

VParC: A Compression Scheme for Numeric Data in Column-Oriented Databases

VParC: A Compression Scheme for Numeric Data in Column-Oriented Databases The Intenatonal Aab Jounal of Infomaton Technology VPaC: A Compesson Scheme fo Numec Data n Column-Oented Databases Ke Yan, Hong Zhu, and Kevn Lü School of Compute Scence and Technology, Huazhong Unvesty

More information

LASER ABLATION ICP-MS: DATA REDUCTION

LASER ABLATION ICP-MS: DATA REDUCTION Lee, C-T A Lase Ablaton Data educton 2006 LASE ABLATON CP-MS: DATA EDUCTON Cn-Ty A. Lee 24 Septembe 2006 Analyss and calculaton of concentatons Lase ablaton analyses ae done n tme-esolved mode. A ~30 s

More information

Exact Simplification of Support Vector Solutions

Exact Simplification of Support Vector Solutions Jounal of Machne Leanng Reseach 2 (200) 293-297 Submtted 3/0; Publshed 2/0 Exact Smplfcaton of Suppot Vecto Solutons Tom Downs TD@ITEE.UQ.EDU.AU School of Infomaton Technology and Electcal Engneeng Unvesty

More information

Queuing Network Approximation Technique for Evaluating Performance of Computer Systems with Hybrid Input Source

Queuing Network Approximation Technique for Evaluating Performance of Computer Systems with Hybrid Input Source Int'l Conf. cientific Coputing CC'7 3 Queuing Netwok Appoxiation Technique fo Evaluating Pefoance of Copute ystes with Hybid Input ouce Nozoi iyaoto, Daisuke iyake, Kaoi Katsuata, ayuko Hiose, Itau Koike,

More information

A New Approach for Deriving the Instability Potential for Plates Based on Rigid Body and Force Equilibrium Considerations

A New Approach for Deriving the Instability Potential for Plates Based on Rigid Body and Force Equilibrium Considerations Avalable onlne at www.scencedect.com Poceda Engneeng 4 (20) 4 22 The Twelfth East Asa-Pacfc Confeence on Stuctual Engneeng and Constucton A New Appoach fo Devng the Instablty Potental fo Plates Based on

More information

Chapter 3 ROBUST TOPOLOGY ERROR IDENTIFICATION. (1970), is based on a super-bus network modeling that relates the power and voltage

Chapter 3 ROBUST TOPOLOGY ERROR IDENTIFICATION. (1970), is based on a super-bus network modeling that relates the power and voltage Chapte 3 ROBUST TOOLOGY ERROR IDENTIFICATION 3. Motvaton As entoned n Chapte 2, classc state estaton, deved by Schweppe and Wldes 970, s based on a supe-bus netwo odelng that elates the powe and voltage

More information

CEEP-BIT WORKING PAPER SERIES. Efficiency evaluation of multistage supply chain with data envelopment analysis models

CEEP-BIT WORKING PAPER SERIES. Efficiency evaluation of multistage supply chain with data envelopment analysis models CEEP-BIT WORKING PPER SERIES Effcency evaluaton of multstage supply chan wth data envelopment analyss models Ke Wang Wokng Pape 48 http://ceep.bt.edu.cn/englsh/publcatons/wp/ndex.htm Cente fo Enegy and

More information

Three Algorithms for Flexible Flow-shop Scheduling

Three Algorithms for Flexible Flow-shop Scheduling Aercan Journal of Appled Scences 4 (): 887-895 2007 ISSN 546-9239 2007 Scence Publcatons Three Algorths for Flexble Flow-shop Schedulng Tzung-Pe Hong, 2 Pe-Yng Huang, 3 Gwoboa Horng and 3 Chan-Lon Wang

More information

Chapter 12 Lyes KADEM [Thermodynamics II] 2007

Chapter 12 Lyes KADEM [Thermodynamics II] 2007 Chapter 2 Lyes KDEM [Therodynacs II] 2007 Gas Mxtures In ths chapter we wll develop ethods for deternng therodynac propertes of a xture n order to apply the frst law to systes nvolvng xtures. Ths wll be

More information

COMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS

COMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS ultscence - XXX. mcocd Intenatonal ultdscplnay Scentfc Confeence Unvesty of skolc Hungay - pl 06 ISBN 978-963-358-3- COPLEENTRY ENERGY ETHOD FOR CURVED COPOSITE BES Ákos József Lengyel István Ecsed ssstant

More information

On the number of regions in an m-dimensional space cut by n hyperplanes

On the number of regions in an m-dimensional space cut by n hyperplanes 6 On the nuber of regons n an -densonal space cut by n hyperplanes Chungwu Ho and Seth Zeran Abstract In ths note we provde a unfor approach for the nuber of bounded regons cut by n hyperplanes n general

More information

Embedding dimension estimation of high dimensional chaotic time series using distributed time delay neural network

Embedding dimension estimation of high dimensional chaotic time series using distributed time delay neural network 8th WSEAS Intenatonal Confeence on SYSTEMS THEOY and SCIENTIFIC COMPUTATION (ISTASC 08) Ebeddng denson estaton of hgh densonal chaotc te sees usng dstbuted te delay neual netwok MAYAM PAIZANGENEH, MOHAMMAD

More information

Remember: When an object falls due to gravity its potential energy decreases.

Remember: When an object falls due to gravity its potential energy decreases. Chapte 5: lectc Potental As mentoned seveal tmes dung the uate Newton s law o gavty and Coulomb s law ae dentcal n the mathematcal om. So, most thngs that ae tue o gavty ae also tue o electostatcs! Hee

More information

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. "Flux": = da i. "Force": = -Â g a ik k = X i. Â J i X i (7.

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. Flux: = da i. Force: = -Â g a ik k = X i. Â J i X i (7. Themodynamcs and Knetcs of Solds 71 V. Pncples of Ievesble Themodynamcs 5. Onsage s Teatment s = S - S 0 = s( a 1, a 2,...) a n = A g - A n (7.6) Equlbum themodynamcs detemnes the paametes of an equlbum

More information

ISSN: [Ramalakshmi* et al., 6(1): January, 2017] Impact Factor: 4.116

ISSN: [Ramalakshmi* et al., 6(1): January, 2017] Impact Factor: 4.116 IJESRT ITERATIOAL JOURAL OF EGIEERIG SCIECES & RESEARCH TECHOLOGY AALYSE THE OPTIMAL ERROR AD SMOOTHESS VALUE OF CHEBYSHEV WAVELET V.Raalaksh, B.Raesh Kua, T.Balasubaanan (Depatent o Matheatcs, atonal

More information

4 SingularValue Decomposition (SVD)

4 SingularValue Decomposition (SVD) /6/00 Z:\ jeh\self\boo Kannan\Jan-5-00\4 SVD 4 SngulaValue Decomposton (SVD) Chapte 4 Pat SVD he sngula value decomposton of a matx s the factozaton of nto the poduct of thee matces = UDV whee the columns

More information

4 Recursive Linear Predictor

4 Recursive Linear Predictor 4 Recusve Lnea Pedcto The man objectve of ths chapte s to desgn a lnea pedcto wthout havng a po knowledge about the coelaton popetes of the nput sgnal. In the conventonal lnea pedcto the known coelaton

More information

Ranks of quotients, remainders and p-adic digits of matrices

Ranks of quotients, remainders and p-adic digits of matrices axv:1401.6667v2 [math.nt] 31 Jan 2014 Ranks of quotents, emandes and p-adc dgts of matces Mustafa Elshekh Andy Novocn Mak Gesbecht Abstact Fo a pme p and a matx A Z n n, wte A as A = p(a quo p)+ (A em

More information

19 The Born-Oppenheimer Approximation

19 The Born-Oppenheimer Approximation 9 The Bon-Oppenheme Appoxmaton The full nonelatvstc Hamltonan fo a molecule s gven by (n a.u.) Ĥ = A M A A A, Z A + A + >j j (883) Lets ewte the Hamltonan to emphasze the goal as Ĥ = + A A A, >j j M A

More information

Properties of the magnetotelluric frequency-normalised impedance over a layered medium

Properties of the magnetotelluric frequency-normalised impedance over a layered medium JOURNAL OF THE BALKAN GEOPHSICAL SOCIET, Vol., No 3, August 999, p. 63-74, 7 fgs. Popetes of the agnetotelluc fequency-noalsed pedance ove a layeed edu Ahet T. Basoku Ankaa Unvestes, Fen Fakultes, Jeofzk

More information

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15, Event Shape Update A. Eveett A. Savn T. Doyle S. Hanlon I. Skllcon Event Shapes, A. Eveett, U. Wsconsn ZEUS Meetng, Octobe 15, 2003-1 Outlne Pogess of Event Shapes n DIS Smla to publshed pape: Powe Coecton

More information

Stellar Astrophysics. dt dr. GM r. The current model for treating convection in stellar interiors is called mixing length theory:

Stellar Astrophysics. dt dr. GM r. The current model for treating convection in stellar interiors is called mixing length theory: Stella Astophyscs Ovevew of last lectue: We connected the mean molecula weght to the mass factons X, Y and Z: 1 1 1 = X + Y + μ 1 4 n 1 (1 + 1) = X μ 1 1 A n Z (1 + ) + Y + 4 1+ z A Z We ntoduced the pessue

More information

Automated Tolerance Optimization Using Feature-driven, Production Operation-based Cost Models

Automated Tolerance Optimization Using Feature-driven, Production Operation-based Cost Models Autoated Toleance Optzaton Usng Featue-dven, Poducton Opeaton-based Cost Models Zuon ong, Assocate Pofesso Gay G. Wang, Ph.. Canddate epatent of Mechancal Engneeng Unvesty of Vctoa, Vctoa, B.C., Canada

More information

Applied Mathematics Letters

Applied Mathematics Letters Appled Matheatcs Letters 2 (2) 46 5 Contents lsts avalable at ScenceDrect Appled Matheatcs Letters journal hoepage: wwwelseverco/locate/al Calculaton of coeffcents of a cardnal B-splne Gradr V Mlovanovć

More information

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy.

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy. Elastc Collsons Defnton: to pont asses on hch no external forces act collde thout losng any energy v Prerequstes: θ θ collsons n one denson conservaton of oentu and energy occurs frequently n everyday

More information

Capítulo. Three Dimensions

Capítulo. Three Dimensions Capítulo Knematcs of Rgd Bodes n Thee Dmensons Mecánca Contents ntoducton Rgd Bod Angula Momentum n Thee Dmensons Pncple of mpulse and Momentum Knetc Eneg Sample Poblem 8. Sample Poblem 8. Moton of a Rgd

More information

System in Weibull Distribution

System in Weibull Distribution Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2015. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng

More information

Potential Theory. Copyright 2004

Potential Theory. Copyright 2004 Copyght 004 4 Potental Theoy We have seen how the soluton of any classcal echancs poble s fst one of detenng the equatons of oton. These then ust be solved n ode to fnd the oton of the patcles that copse

More information

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite PHYS 015 -- Week 5 Readng Jounals today fom tables WebAssgn due Wed nte Fo exclusve use n PHYS 015. Not fo e-dstbuton. Some mateals Copyght Unvesty of Coloado, Cengage,, Peason J. Maps. Fundamental Tools

More information

BOUNDARY-ONLY INTEGRAL EQUATION APPROACH BASED ON POLYNOMIAL EXPANSION OF PLASMA CURRENT PROFILE TO SOLVE THE GRAD-SHAFRANOV EQUATION

BOUNDARY-ONLY INTEGRAL EQUATION APPROACH BASED ON POLYNOMIAL EXPANSION OF PLASMA CURRENT PROFILE TO SOLVE THE GRAD-SHAFRANOV EQUATION Ttle Bounday-only ntegal equaton appoach based on po Gad Shafanov equaton Authos)Itagak, Masafu; Kasawada, Jun-ch; Okawa, Shu CtatonNuclea Fuson, 443): 47-437 Issue Date 4-3 Doc UR http://hdl.handle.net/5/58433

More information