Improving XOR-Dominated Circuits by Exploiting Dependencies between Operands
|
|
- Lauren Barker
- 6 years ago
- Views:
Transcription
1 Improvng XOR-Domnated Crcuts by Explotng Dependences between Operands Ajay K. Verma Paolo Ienne Ecole Polytechnque Fédérale de Lausanne (EPFL) School of Computer and Communcaton Scences CH-1015 Lausanne, Swtzerland ABSTRACT Logc synthess has made mpressve progress n the last decade and has pervaded dgtal desgn replacng almost unversally manual technques. A remarkable excepton s computer arthmetc and datapath desgn, where desgners stll rely mostly on well studed archtectures; on datapaths, n most cases, logc synthess plays at most a mnor role n the optmsaton of netlsts. A case n pont s multple addtons performed n carry-save form, such as those fundamentally consttutng parallel multplers: column compressors are usually bult explotng the regularty of the crcut and, due to the very large number of XOR operatons, are hardly optmsed further by logc synthessers. In fact, due to the shortcomngs of algebrac factorng, XOR operatons are usually left untouched by logc synthessers. In ths paper we show a general technque to optmse XOR domnated crcuts and we demonstrate ts effectveness on multpler-lke crcuts. We show that t optmses sgnfcantly the best parallel multplers by explotng complex dependences between the addenda whch escape known manual optmsatons. Netlsts correspondng to top arthmetc archtectures can ether be synthessed drectly or preprocessed through our technque before standard logc synthess: our preprocessng stage makes t possble to acheve some 20% speed mprovement. To our best knowledge, optmsatons of the type we show for multpler-lke structures have never been reported nether manually derved, n computer arthmetc lterature, nor automatcally derved, n desgn automaton lterature. 1. MOTIVATION The effcent addton of multple addenda n hardware s an extremely mportant topc n datapath desgn: not only s t a common problem n applcaton specfc datapaths for sgnal processng, but t s also one of the key desgn ssues n parallel multplers. The general approach n multplers has been known for decades [18] and conssts n usng a carry-save representaton for the multple addtons and then n employng a fast fnal adder to produce the result. A compressor tree s used for the multple addtons whch takes n nput ntegers x 1,x 2,...,x n and outputs two words C and S, such that the sum of the two words s same as the sum of nput ntegers. The advantage of usng ths s that t uses carry-save adders n ts mplementaton, whch reduce 3 ntegers to 2 ntegers n the tme taken by a full adder. Some recent work has addressed the problem of nferrng the use of the carry-save representaton and compressors beyond multplers [15, 6, 16, 17] whle a large body of lterature has been wrtten on the best ways to mplement column compressors for multplers (e.g., [13, 14]). Whle we wll dscuss n more detal some of p 7 p 6 p 5 p 4 a3 b0 p 3 a2 b0 p 2 ( ) ( ) a1 b0 p 1 (( )( ))(( ) ( ) ( )) (( )( ))( ) p 7 (a) Smple dependences among partal product bts. p 6 p 5 p 4 p 0 a3 b0 p 3 a2 b0 p 2 a1 b0 p 1 (( )( ))(( ) ( ) ( )) (( ) ( )) (( )( ))(( ) ( ) ( )) (( ) ( )) (( )( ))(( ) ( ) ( )) (( ) ( )) (b) Complex dependences among partal product bts. Fgure 1: An llustraton of varous knd of dependences among the partal product bts of a multpler. these contrbutons n a later secton, we notce that all approaches to desgn good or optmal compressors assume that the nputs of the compressor tree are ndependent of each other. However, often ths s not the case and falure to explot that leads to nferor results. Consder for nstance the multplcaton of two ntegers A and B: the partal products are generated as the product of all ndvdual bts of A and B and accumulated usng a compressor tree. As shown n Fg. 1(a) the partal product bts are naturally dependent on each other. For example, f the bts and are both 1, then the bt must also be 1. Usng ths nformaton, the expresson for the 3 rd carry bt of the fnal adder can be smplfed: The expresson for ths carry bt s ( ). Snce XY = X(Y X), where(y X) denotes the expresson (Y gven X), the carry expresson can be wrtten as (( ) ). Usng the mplcaton above, the ex- p /07/$ IEEE. 601
2 presson can be reduced to ( a2 ). A much less evdent example of dependences among partal product bts s shown n Fg. 1(b), where, explotng the dependency among the bts, the expresson for the 4 th sum bt can be smplfed as shown n Fg. 7. Arguably, explotng such dependences s not so much a problem of mprovng the desgn of the compressor trees but an ssue for the logc synthesser whch should mplement them. In fact, logc synthessers are extremely good at explotng the dependency among the nput bts of AND and OR gates; yet, they never explot any nontrval dependency among the nput bts of XOR gates. The reason behnd ths s that most heurstcs used for two-level and mult-level optmsaton assume that the nput s gven n the sum of product form. Very few heurstcs consder expressons contanng XORs (e.g., [1]); however, these heurstcs are also very restrctve n nature. One way to optmse XOR-domnated expressons would be to express all XOR expressons n terms of AND and OR gates, and then use normal heurstcs. However, ths approach both ncreases the sze of the nput expresson exponentally and the usual heurstcs produce much worse results on such expanded expressons due to the lmtatons of algebrac factorng. For ths reason, logc synthessers rewrte XOR gates n terms of ANDs and ORs only when the operands of XOR gates are very small expressons. Hence, usual synthessers are unable to explot nontrval dependences among the nput bts of XOR gates and, ultmately, do a pretty lousy optmsaton job on many arthmetc-relevant crcuts as shown n Secton 7. Broadly speakng, ths s the problem we am to fx. In the next secton, we ntroduce the central problem of selectvely rewrtng XORs, whose soluton s at the heart of our optmsaton goal. Ths makes t possble to gve a more precse formulaton to the problem we tackle. In Secton 3 we dscuss prevous work n the area and outlne relevant dfferences. Secton 4 ntroduces the estmator functon whch we use to estmate the delay and area of the crcut correspondng to an expresson. We then proceed nto the man materal by explanng n Secton 5 the basc strategy to decde whch XORs to rewrte. Fnally, Secton 6 summarses our optmsaton algorthm and Secton 7 dscusses some expermental results usng our optmsaton technque. Secton 8 contans some concludng remarks on ths work. 2. REWRITING EVERY XOR OR NOT? Informally, we already ndcated that one of the key shortcomngs of logc synthess when dealng wth XOR-domnated arthmetc crcuts s not managng to explot the dependency between operands of XOR gates. We mentoned that such a dependency would be easly exploted by tradtonal synthess heurstcs f the XORs were rewrtten n terms of ANDs and ORs, but also observed that rewrtng XORs ndscrmnately s a poor strategy. Hence, we need to understand under what condtons rewrtng XORs helps logc synthess. Two examples help us see clearer n ths matter: (1) P = + +, Q = ( ), and R = P Q. (2) P = a 4, Q = b 4, and R = P Q. The frst s an ntermedate expresson n the mplementaton of an 8 8 multpler One can see that expresson P corresponds to the majorty of the three bts,,, whle Q s true only when the party of the three bts s odd (.e., the expressons are suffcently correlated). If we use a logc synthesser (such as the Synopsys Desgn Compler) to get the fastest mplementaton of R, we get one whch has a crtcal path delay of 0.37ns and a standard-cell area of 138.2µm 2. However, f we expand the XOR n (P Q) n terms of AND and OR gates, the synthesser produces a crcutwtha crtcalpathdelay of0.26ns and a standard-cell area of 146.9µm 2 whch s 30% faster at a small area cost. Now consder the second example where P and Q do not share any varables. Here the drect synthess gves a crcut wth a crtcal path delay of 0.22ns and a standard-cell area of 58.8µm 2, whle, after expandng the fnal XOR n terms of AND and OR gates, the logc synthess results n a crcut wth a crtcal path delay of 0.27ns and a standard-cell area of 221.2µm 2. Not only does the crtcal path now ncrease, but also the synthesser s unable to recover the area and the fnal result s almost four tmes larger. The two examples show that sometmes rewrtng XOR gates n terms of AND and OR gates s useful and sometmes t s not. Our goal s therefore to recognse the cases where expandng an XOR s useful and separate them clearly from those where t s not. Our problem s therefore the followng: PROBLEM 1. Gven a crcut consstng of AND, OR, NOT, and XOR gates, fnd the lst of XOR gates whch, f rewrtten n terms of AND and OR gates, result n a crcut wth the smallest crtcal path delay. 3. STATE OF THE ART The most general approach to optmse XOR-domnated crcuts by explotng nter operand dependences s to mplement optmsaton heurstcs better than those currently employed [5]. Not much lterature exsts n ths respect; a recent example [1] has already been mentoned and conssts a technque smlar to espresso [3] but whch apples to 2-SPP expressons. These are expressons where both nputs of an XOR must be lterals, whch s clearly not the case n many crcuts of nterest, ncludng compressors. Some older work [11] has attempted to explot Bnary Decson Dagrams (BDD), but BDDs for multplcaton have been shown to have an exponental number of nodes [4] and make such technques hardly applcable. Ths s true also of more recent attempts along smlar lnes [19, 8]. Focusng on column compressors, a consderable body of work has been accumulated whle we tred to mprove ther speed [10]. Broadly, we can classfy these works nto two categores: The frst conssts of works where the counter sze nsde the compressor tree s ncreased (a full-adder s -2 counter, that s, a devce whch counts the ones on 3 nputs and produces -bt bnary count). For some tme the computer arthmetc communty was anmated by the problem of dentfyng the perfect counter (e.g., see [13]). The second conssts of work where the counter sze remans the same but the schedulng of the counters s changed. The three-greedy approach [14, 9] falls n ths category. It has been proved expermentally that the second approach s way more mportant than the frst one for large nput counts. As mentoned prevously, the major problem of such work s n gnorng completely what we are most nterested n: the correlaton across nput bts of the compressors. 4. DELAY AND AREA MODEL Before proceedng, let us ntroduce the delay and area model whch we use to estmate the crtcal path delay and hardware area of the crcut correspondng to a Boolean expresson. 602
3 Gven a Boolean expresson, frst we replace each of the XOR subexpressons by a new varable and the new expresson can be realzed n sum of product form. Next we use espresso-lke [3] heurstcs to smplfy the expresson and fnally factorze t usng a heurstc due to Brayton [2]. In the heurstc, the factorsaton problem s formulated as a Rectangle-coverng problem and a greedy algorthm known as Png-pong s used to solve the same. Although slower than other factorng technques such as Quck Factorng (QFACTOR) [12], t gves much better results. Fnally, we replace the new varables by ther orgnal XOR expressons and map the resultng expresson usng NOT gates and two-nput AND, OR, XOR gates. To estmate the crtcal path delay and hardware area, any approprate values of delay and area can be used for each gate. 5. SELECTIVE EXPANSION OF XORS As we have seen n Secton 2, whle expandng XORs sometmes makes the crcut worse n terms of both crtcal path delay and hardware area, the penalty on hardware area s more severe. The reason for ths s that expanson of XOR gate ntroduces many NOT gates and algebrac factorng does not recognse the complement of an expresson as t gnores many Boolean propertes. As an example consder the followng expresson: x = abc + abc + abc + abc, factorze(x) x = a(bc + bc)+a(bc + bc), optmse(x) y = bc + bc, x = ay + ay. Snce algebrac factorsaton fals to recognse that the expressons bc+bc and bc+bc are complements of each other, the new varable y s never ntroduced as shown above. Ths s the man reason why expandng XOR ncreases area severely. As far as crtcal path delay s concerned, only f the used heurstc s good enough, t mght be restored to the orgnal value. Our problem s to decde whether expandng an XOR gate s useful. As we have seen n the prevous secton, f the two operands of an XOR gate are sgnfcantly correlated, then expandng the XOR gate mproves the crcut delay sgnfcantly. In the next secton we wll see that sometmes t s benefcal to expand an XOR gate even wth non correlated operands due to extreme correlaton between the rest of the expresson and the operands of the XOR gate. We call the frst knd of correlaton local correlaton and the second global correlaton. In the next two subsectons we dscuss how to measure the local and global correlaton correspondng to an XOR gate. 5.1 Local Correlaton To measure the local correlaton between the operands of an XOR gate we consder the followng expanson of XOR: A B =(AB)(A + B). Ths expanson of XOR gate s nterestng as t reveals when expandng an XOR gate can be useful. For example, f A and B are such that AB =0,thenA B s equvalent to A + B. Smlarly f A and B are such that A + B =1,thenA B s equvalent to AB. More generally, we can say that f A and B are such that ether AB or A + B can be computed very quckly compared to A and B, then expandng A B n terms of AND and OR gates s useful. Let us see the results of ths test on the examples of the prevous secton. In the frst example PQ =, whch can be computed very quckly compared to P and hence expandng P Q n terms of AND and OR gates should be useful, whch sexpansonuseful (Operand A, Operand B) { X = gcd(a, B); P = A / X; Q = B / X; (D P, AR P ) = estmatedelayarea(p); (D Q, AR Q) = estmatedelayarea(q); (D PQ, AR PQ) = estmatedelayarea(pq); (D P +Q, AR P +Q) = estmatedelayarea(p + Q); ɛ = 1 - (mn(d PQ, D P +Q) / max(d P, D Q)); δ = ((AR PQ + AR P +Q) / (AR P + AR Q)) - 1; f (δ δ threshold and ɛ ɛ threshold and (δ / ɛ) κ) return true; return false; Fgure 2: Algorthm to decde when an XOR gate should be expanded due to local correlaton. s actually true, as we have seen before. On the other hand, n the second example both of the expressons PQand P +Q have longer crtcal path compared to P and Q. Hence, accordng to our test, we should not expand P Q n terms of AND and XOR gates. Ths s reflected by the penalty of expandng XOR, shown n the prevous secton. The algorthm to decde whether expandng an XOR gate s benefcal or not s shown n Fg. 2. Suppose the two operands of the XOR gate are A and B: Frst we factor both the expressons and take the common factor X out, and then the correlaton between the two quotents P and Q s checked. We estmate the delay and area of the expressons P, Q, PQ,andP + Q usng the estmator mentoned n secton 4. Based on these values we compute ɛ and δ; the frst of the two values denotes the reducton n crtcal path delay of PQ or P + Q (whchever s smaller) wth respect to crtcal path delay of P or Q (whchever s larger). The second value represents the area overhead of the expressons PQ and P + Q wth respect to P and Q. Based on these values an expanson s allowed only f the two operands have a sgnfcant correlaton (ɛ ɛ threshold), the area overhead of expanson s not huge (δ δ threshold), and the rato of the two values δ and ɛ s less than some constant κ (.e., area penalty per unt gan n crtcal path delay s small). In Secton 7, we show the effects of the constants ɛ threshold, δ threshold, andκ on the performance of the resultng crcut. 5.2 Global Correlaton As we have dscussed earler, f the nput operands of an XOR gate are correlated enough, then expandng that XOR s advantageous for performance mprovement. However, we wll see n the next example that the correlaton s not a necessary condton to gan by expandng XORs. Consder the two dfferent mplementatons of a secton of 6-bt adder shown n Fg. 3: It s easy to verfy that the two crcut descrptons correspond to equvalent modules of 6-bt adder. However Desgn Compler s unable to recognse ths somorphsm and produces sgnfcantly dfferent crcuts. The basc dfference between the two representatons s the defnton of p and c. These two are expressed usng XOR gates n the frst representaton whle, n the second, OR gates are used. Now let us check whether our approach mentoned n the earler secton s able to transform the frst representaton nto a new one equvalent to the second representaton n terms of performance: Frst consder the expresson for c. In the frst representaton c s wrtten as g p c 1. The two operands of the XOR gate are g and p c 1. Snceg and p correspond to a b and a b,they can never be true smultaneously,.e., g p c 1 =0. Accordng to the approach of prevous secton, ths mples a strong correlaton 603
4 c -1 a b c -1 a b XOR AND Adder Equatons OR AND Adder Equatons p = a XOR b p = a OR b XOR AND XOR g = a AND b z = p XOR c -1 c = g XOR (p AND c -1) XOR AND OR g = a AND b z = p XOR c -1 c = g OR (p AND c ) -1 z c Delay = 0.60 ns Hardware Area = sq. mcron z Delay = 0.56 ns Hardware Area = sq. mcron c Fgure 3: Implementaton of a secton of 6-bt rpple carry adder wthout and wth XOR gates. betwen the operands. Hence we wll expand the XOR gate n terms of AND and OR gates resultng n c = g + p c 1, whch s the same as the defnton of c n the second representaton: c = g p c 1, c =(g p c 1)(g + p c 1), c = g + p c 1 (because g p c 1 =0). Now let us consder the defnton of p n the frst example,.e., a b. Snce the nputs of the XOR gate are not correlated at all, accordng to our approach we should not expand ths XOR. However, t can be noted that f we expand the XOR gate of p usng AND and OR gates, the correspondng defnton of c wll look lke a b + a b (a + b )c 1. Any logc synthesser wll be able to understand that ths expresson s the same as a b +(a + b )c 1, whch s exactly the defnton of c n the second representaton: c = a b +(a b )c 1, c = a b +(a b )(a + b )c 1, c = a b +(a + b )c 1. Ths example shows that sometmes expandng an XOR gate s useful because there s a strong correlaton between the operands of the XOR gate and the rest of the expresson. Ths s what we have called global correlaton. In order to measure ths knd of correlaton, we consder the followng expansons of an XOR gate nsde expresson: (A B)+C =(AB C)(A + B + C), (A B)+C =(AB C)(AB + C). Usng these rules of rewrtng XOR, one can transform the second representaton of adder nto the frst one as shown below: c = a b +(a b )c 1, c =(a b c 1 a b )(a b +(a + b )c 1), c = a b +(a + b )c 1. Note that the expresson (X Y ) s the same as (X + Y ). Ths expanson s advantageous when one of the two expressons (AB C) and (A B C) s almost a tautology (.e., s true for all but a very sparse set of assgnments of Boolean varables). Ths s because n those cases (AB C) or (A B C) can be computed very quckly and the crtcal path delay of the whole expressons wll be almost same as that of the expressons (A+B +C) or (AB + C). Usng ths knd of expanson n the above case mmedately transforms the second representaton nto the orgnal rpple carry adder as shown above. A smlar llustraton of the applcaton of global correlaton can be seen n a more complex crcut known as comparator functon. A comparator takes two ntegers A and B as nputs, and outputs 1 f A > B and 0 otherwse. One way to mplement the comparator functon s to compare the most sgnfcant bts of two ntegers and f they are dfferent then output 1 or 0 dependng on whch bt s 1. However f the two bts are equal then one should compare the second most sgnfcant bt and so on. In other words the comparator functon c(a, B) can be wrtten lke ths: c(a, B) =(a n b n)a n +(a n b n)c(a, B ), c(a, B) =a nb n +(a n b n)a n 1b n 1 +, c(a, B) =a nb n +(a n b n)a n 1b n 1 +. A more effcent way to mplement comparator functon s to subtract B from A and output the sgn bt (.e., the most sgnfcant bt) of the result. In other words, the comparator functon corresponds to the carry output of A B(the subtracton beng A + NOT(B)+ 1). Snce the expresson for the carry output can be wrtten usng propagate (p) and generate (g) functons, the expresson for comparator wll look lke ths: c(a, B) =g n + p ng n p np n 1 p 1, where (g n,p n)=(a nb n,a n + b n). Therefore c(a, B) =a nb n +(a n + b n)a n 1b n 1 +. Once agan, the dfference between the two mplementaton s that the frst mplementaton uses XORs, whle the second one uses ORs. It s easy to see that usng global correlaton the frst mplementaton can be converted nto the second one. c(a, B) =a nb n +(a n b n)a n 1b n 1 +, c(a, B) =(a nb na n 1b n 1 a nb n) (a nb n +(a nb n)a n 1b n 1)+, c(a, B) =a nb n +(a n + b n)a n 1b n 1 +. In order to ensure that the resultng crcut does not have too much area overhead, we use the parameters ɛ threshold, δ threshold, and κ here also. In other words, an expanson s allowed only f t mproves the delay at least by ɛ threshold, does not ncrease area more than δ threshold, and the rato of area overhead and delay gan s less than κ. 604
5 getschedulng (Lst L, functon f()) { f (sze(l) = 1) { return top(l); (x 1, x 2) = fndmnpar(l, f); // fnd x 1, x 2 L such that // f(x 1, x 2) s mnmum. z = x 1 x 2; L = (L {z) - {x 1, x 2; return getschedulng(l); Fgure 4: A greedy heurstc to compute the XOR of n expressons where the latency of an XOR gate depends on ts nputs. 5.3 Mergng Local Correlaton and Schedulng Algorthm It s easy to note that the order n whch XOR gates are expanded s also mportant and affects the fnal result. Ths s because the reducton system correspondng to expandng XORs s not persstent and the expanson of an XOR gate mght make the expanson of another one dsdvantageous. In some cases even locally useful expansons may worsen the overall performance of the crcut. As an example consder the Boolean expresson (A B C),where the nput arrval tmes of A, B and C are 1, 1, and2. Assume that the latency of an XOR gate s 0.13 ns and among all XOR gates only (B C) s benefcal to expand and results n a reducton of 0.08 ns. In other words the drect mplementaton of (B C) wll have a delay of 2.13 ns; however, after expandng the XOR the correspondng expresson wll have a delay of 2.05 ns. Now consder two dfferent ways of computng (A B C): (A B) C delay = max( max(1, 1), 2) = 2.13, A (B C) delay = max(1, 2.05) =2.18. Note that the second mplementaton, where XOR gate n (B C) s expanded, has a longer crtcal path compared to the frst mplementaton where no XOR gate was expanded. Ths s because the second crcut s more lopsded than the frst one. Ths ndcates that we also need to consder the nput arrval tmes of the operands of an XOR whle decdng whether ts expanson s useful. In other words, the problem s to fnd a schedulng and then expandng those XORs whch have correlated nputs, the resultng expresson has the least crtcal path delay. More formally we can formulate the problem as follows. PROBLEM 2. Gven a lst of n expressons X 1,X 2,,X n, and a functon f(.) whch takes two expressons and returns the tme when ther XOR can be computed, fnd an optmal schedulng to compute the XORs of the n expressons. Note that f no expanson of XOR gates s allowed then the functon f(.) wll be a trval functon and can be wrtten lke f(x, y) = D XOR + max(t x,t y),wheret x, t y are the arrval tmes of expressons x and y. In ths case the problem can be solved optmally usng a two-greedy strategy: The algorthm takes a lst of expressons and chooses the two expressons from the lst whch have the least arrval tme; t then removes them from the lst and nserts a new varable correspondng to ther XOR. Ths process s repeated tll the sze of the lst s reduced to one. However, when XOR-expanson s allowed, the functon f(.) denotes the tme by whch XOR of two expressons can be computed (ether by usng an XOR gate or by rewrtng the XOR usng AND and OR gates whchever s better n terms of delay). Ths means that the behavour of f(.) s largely unknown. The only restrcton s that f(x, y) D XOR+max(t x,t y). In order to solve the problem wth general f(.), we use a slght varaton of the two-greedy strategy as shown n Fg. 4. The only dfference s that n each teraton, nstead of choosng the two varables wth shortest arrval tmes, we choose the par whose XOR can be computed earlest among all pars. Although ths heurstc wll not always produce optmal results, practcally t mproves the performance of a crcut sgnfcantly. Also, t can be proved that ths heurstc wll never worsen a crcut,.e., the crcut produced by the heurstc wll have a crtcal path no longer than the crtcal path n optmal schedulng of the crcut wthout expandng any XOR gate. More formally: THEOREM 1. If the XOR of n expressons X 1,X 2,,X n can be computed n tme T wthout expandng any XOR gate, then the above mentoned greedy heurstc of Fg. 4 wll result n a crcut wth crtcal path delay T T. PROOF. Instead of provng the above theorem we wll prove the followng more general statement. Assume two lsts of n expressons X 1, X 2,, X n and Y 1, Y 2,, Y n, wth arrval tmes t x1, t x2,, t xn and t y1, t y2,, t yn respectvely, and t y t x for all. If the XOR of X 1, X 2,, X n s computed wthout expandng any XOR n T x tme and the XOR of Y 1, Y 2,, Y n s computed usng the above mentoned greedy heurstc n T y tme, then T y T x. It s easy to note that provng ths statment suffces to prove the theorem. The proof s by nducton on n. The base case corresponds to n =1, whch s trval. Assume that the statement s true for (n 1) expressons and that the latency of XOR gate s t. Wthout loss of generalty, we can also assume that the arrval tmes of expressons X 1, X 2,, X n as well as the arrval tmes of expressons Y 1, Y 2,, Y n are n non-decreasng order. We have mentoned before that f the expanson of XOR s not allowed, then the two greedy strategy s optmal; hence, n the frst step to compute the XORs of X 1, X 2,, X n, the optmal strategy computes the XOR of X 1 and X 2. After ths step, the lst of operands wll look lke X new, X 3,, X n, wth arrval tmes t + t X2, t X3,, t Xn. On the other hand, n the frst step to compute XORs of Y 1, Y 2,, Y n, the greedy strategy wll compute the XOR of Y and Y j,where(y,y j) s the par whose XOR can be computed earlest among all possble pars. After ths step, the lst of operands wll be Y new, Y 1,, Y 1, Y +1,, Y j 1, Y j+1, Y n, wth arrval tmes t new, t Y 1,, t Yn. Accordng to the defnton of greedy heurstc, we can see that α, β(t new f(y α,y β )) α, β(t new t + max(t Yα,t Yβ )) t new t + max(t Y 1,t Y 2) t new t + max(t X1,t X2) t new t + t X2. Also note that t Y 1 t X1 t X and t Y 2 t X2 t Xj. Ths means that after one step also the arrval tmes of expressons satsfy the property t Yk t Xk. But after one step we have (n 1) expressons, and now we can say by the nducton hypothess that the XOR of the X s can not be computed any sooner than the XOR of Y s. Ths completes the proof. 605
6 get XOR Delay (X, Y) { P = gcd(x, Y); (D X, AR X) = estmatedelayarea(x); (D Y, AR Y ) = estmatedelayarea(y); (D new, AR new) = estmatedelayarea(p((x/p )(Y/P))(X/P + Y/P)); f ((sexpansonuseful(x, Y)) return D new; else return (t XOR + max(d X, D Y )); optmzeusnglocalcorrelaton (tree T) { forall chldren T of T do optmzebyexpandng(t ); f (op(root(t)) = XOR) return (tree)(getschedulng( {(expr)t 1,, (expr)t n, get XOR Delay())); // return the tree correspondng to the // schedulng of XORs. optmzeusngglobalcorrelaton (tree T) { forall nodes n of T n reverse topologcal order do //.e., traverse the tree from leaves to root f (op(n) = OR and (op(n.leftchld) = XOR or op(n.rghtchld) = XOR)) rewrteusngglobalcorrelaton(subtree(n)); // Check for the subtree rooted at n f the // expanson based on global correlaton has // lower crtcal path delay, f so replace // the subtree by the expanson based on // global correlaton. optmzetree (tree T) { whle (mprovement n performance) { collapsetree(t); // collapse the tree usng the assocatvty // of the operators, so that the successor of a // node must be of dfferent type than ths node. optmzeusnglocalcorrelaton(t); optmzeusngglobalcorrelaton(t); Fgure 5: Algorthm for computng the optmsed expresson by selectve expanson of XOR gates. 6. ALGORITHM AND COMPUTATIONAL COMPLEXITY The complete algorthm s shown n Fg. 5. The algorthm takes the nput expresson n the form of a tree where each node corresponds to a Boolean operator and the edges correspond to data dependences. Frst, ths nput tree s collapsed by mergng adjacent nodes of same type nto a sngle mult-nput node (e.g., multnput AND etc.). After that, the tree s optmsed usng local and global correlatons. The optmsaton usng local correlaton conssts of traversng the tree n topologcal order from leaves to root and optmse the subtrees rooted at XOR nodes usng the greedy heurstc mentoned n the earler secton. Then, we check whether any XOR gate can be expanded usng the global correlaton crtera to mprove the performance of the crcut and, f so, rewrte the correspondng subexpresson usng the expanson based on global correlaton. The above two steps are repeated tll we have an mprovement n the delay of the crcut. In order to analyse the tme complexty of the algorthm we need to fnd out how many tmes we are callng the functon estmatedelayarea() because that s the most expensve functon. It s easy to see that n the functon optmzeusngglobalcorrelaton() we call the estmator functon O(n) tmes, where n s the number of nodes n the tree. However, n the functon opt- ADPCM Decoder Unoptmsed µm ns Selectve Expanson µm ns 8 8-bt Multpler DesgnWare µm ns Three Greedy Approach µm ns Selectve Expanson (TGA as nput) µm ns Constant Multplcaton (A 7) A µm ns A +2A +4A µm ns 8A A µm ns Selectve Expanson (A +2A +4Aas nput) µm ns Selectve Expanson (8A Aas nput) µm ns 15-bt Comparator Unoptmsed 514.9µm ns Selectve Expanson 466.2µm ns 15-bt Adder Unoptmsed µm ns Selectve Expanson µm ns Table 1: Optmsaton results for all our benchmarks. ɛ threshold δ threshold κ Delay (ns) Hardware Area (µm 2 ) Table 2: Illustraton of the moderate effects of varous parameters on the fnal result output by Selectve Expanson for 8 8- bt multpler. mzeusnglocalcorrelaton() there are O(n 2 ) calls to the estmator functon. To see ths, frst let us fnd the tme complexty of the getschedulng() functon. In the getschedulng() functon, the sze of the nput lst decreases by one n each teraton (.e., f the ntal sze s m, then there wll be (m 1) teratons). Also, n each teraton, we need to fnd the par whose XOR can be computed earlest among all pars. Note that n the frst teraton we need to compute the effectve XOR latency of all pars. However, n subsequent teratons, only O(k) new pars are generated, where k s the lst sze at the begnng of that teraton,.e., the estmator functon s called only O(m 2 ) tmes. Snce we call the getschedulng() functon at each XOR node, the total number of calls to the estmator functon wll be O(n 2 ). Also, we observed emprcally that we need to terate the functons optmzeusng- GlobalCorrelaton() and optmzeusnglocalcorrelaton() only a constant number of tmes. Hence the overall complexty of the algorthm s O(n 2 T estmator), wheret estmator s the tme complexty to estmate the delay and area of an expresson. 7. EXPERIMENTS We mplemented our algorthm usng Maple 10, whch s used as a front-end to Synopsys Desgn Compler. We synthesse the nput crcut twce: once drectly and a second tme after optmsng usng our algorthm, whch we call as Selectve Expanson. All the crcuts are synthessed usng a common standard-cell lbrary for UMC 0.13µm CMOS technology. Table 1 shows the results of our mplementatons. There are qualtatvely fve dfferent arthmetc crcuts on whch we ran our algorthm: The frst crcut s the kernel of adpcmdecode [7]. Here the nput s the already optmsed crcut usng the technques mentoned n [17]. As we can see, our algorthm mproves the delay by 15% wth a margnal penalty of 606
7 Delay (ns) Three Greedy Approach Desgnware Full Expanson Selectve Expanson Bt Poston Fgure 6: Comparson of btwse delays of the multpler generated by DesgnWare, Three Greedy Approach, Selectve Expanson and Full Expanson. z 3 = ( ) ( + ( + ) ); // After smple optmsatons based on smple // dependences among partal product bts. z 3 = ( ) ( + ( + ) ); // Usng the dependency between XOR operands // ( ) (( )) = 0. z 3 = (( ) + ( )) ( + ( + ) ); // (PQR S) P (Q R) + S = P (Q + R) + S // Here P =, Q =, R =, S = z 3 = (( ) + ( + )) ( + ( + ) ); // Manually optmsed expresson z 3 = (( ) + ) ( + ( + ) ); Fgure 7: A partal executon of our algorthm to optmse the 4 th output bt of 8 8-bt multpler. 5% n terms of hardware area. The second benchmark conssts of an 8 8-bt parallel multpler. Here we compare the results of Selectve Expanson wth the DesgnWare mplementaton and the Three Greedy Approach (TGA) [14]. The results show that the Three Greedy approach, whch s already much better than the DesgnWare mplementaton, s outperformed by Selectve Expanson and an mprovement of 20% n delay s acheved. In Fg 6, we compare the delay values of ndvdual product bt expressons. Note that Selectve Expanson always produces better results compared to DesgnWare and TGA. Compared wth Full Expanson (where all XOR gates are expanded), Selectve Expanson performs better for all but the three most sgnfcant bt expressons; however, the area penalty n Full Expanson s huge (almost 20 tmes the area ncurred by Selectve Expanson). Table 2 shows the effect of varous parameters on the output results for the 8 8-bt multpler generated by Selectve Expanson algorthm. Notce that the delay values for all parameters are pretty close to each other. For other benchmarks we use the combnaton ɛ threshold =0.1,δ threshold =0.2, and κ =2. The thrd benchmark s a constant multpler where 6-bt nteger A s multpled by the constant 7. Note that (A 7) can be mplemented n varous ways: One possblty can be to use the multplcaton provded by Desgn Compler, whle another s to rewrte (A 7) as (A +2A +4A) and then use a carry save adder to add the three values. An even better way s to rewrte (A 7) as (8A A) and use a sngle adder to add them. As we can see the, results of the three mplementatons are very dfferent from each other. We ran Selectve Expanson algorthm on both forms of nputs (A+2A+4A) and (8A A). Note that the resultng crcuts for the two nputs are almost same n terms of crtcal path delay. The fourth benchmark s the comparator functon mentoned n secton 5.2. One can see that n ths benchmark we not only reduce crtcal path delay but also optmse the crcut n terms of hardware area. The last arthmetc crcut corresponds to the 15-bt adder. Although t s an XOR-domnated crcut, the XOR operands are ndependent of each other and hence Selectve Expanson does not expand any of them. Ths means that the adder crcut wll reman unchanged by the Selectve Expanson, whch s n fact reflected by the results. A partal executon of our algorthm to optmse the expresson for the 4 th bt of the output of an 8 8-bt multpler s shown n Fg. 7. Note that the expresson optmsed by our algorthm s very smlar to the manually optmsed expresson. 8. CONCLUSIONS Although logc synthess has made enormous progress and has supplanted manual desgn n the great majorty of dgtal desgn tasks, computer arthmetcs s possbly the last area where people stll manually or sem manually optmse crcuts. Many practcal arthmetc crcuts, both standard and applcaton specfc, are domnated by addtons; equvalently, at the Boolean level, they contan a large number of nested XORs. We have shown how classc lmtatons of logc synthess heurstcs can be crcumvented by selectvely rewrtng only some XOR operatons n terms of ANDs and ORs. Whle synthessers seldom rewrte XOR operatons, our heurstc tres to do that n all cases where one can observe some specfc form of dependency between the two XOR operands: when ths s the case, such rewrtng wll lkely let classc synthess heurstcs acheve better results. In our experments we have used state-of-the-art logc descrptons of the benchmarks, where all classc archtectural optmsatons have been appled already, and we have measured up to more than a ffth reducton n crtcal path for a small area cost. Ths mprovement on already optmsed components would hardly be belevable, were t not for the qualtatve realzaton that the solutons found by our algorthm on a parallel multpler, for nstance, correspond to a type of optmsaton that we have never found descrbed n computer arthmetc lterature probably due to ther rregularty and the complexty of mplementng them manually. Our results llustrate that t s hgh tme for desgn automaton to conquer arthmetc terrtores, so far largely left to manual and archtectural optmsatons. Wth the ncreasng number of softwareorented desgners mplementng complex sgnal and meda processng algorthms on FPGAs or ASICs, automatcally achevng mplementatons of arthmetc components whch are at least equal to manual mplementatons becomes a must. And, as we show, better-than-manual desgns are a concrete possblty. 9. REFERENCES [1] A. Bernascon, V. Cran, R. Drechsler, and T. Vlla. Effcent mnmzaton of fully testable 2-SPP networks. In Proceedngs of the Desgn, Automaton and Test n Europe Conference and Exhbton, Munch, Mar [2] R. K. Brayton. Factorng logc functons. IBM Journal of Research and Development, 31(2):187 98, Mar [3] R. K. Brayton, G. D. Hachtel, C. T. McMullen, and A. L. 607
8 Sangovann Vncentell. Logc Mnmzaton Algorthms for VLSI Synthess. Kluwer Academc, Boston, Mass., [4] R. E. Bryant. The complexty of VLSI mplementatons and graph representatons of Boolean functons wth applcaton to nteger multplcaton. IEEE Transactons on Computers, 40(2):205 13, Feb [5] G. De Mchel. Synthess and Optmzaton of Dgtal Crcuts. McGraw-Hll, New York, [6] T. Km, W. Jao, and S. Tjang. Crcut optmzaton usng carry-save-adder cells. IEEE Transactons on Computer-Aded Desgn of Integrated Crcuts and Systems, CAD-17(10):974 84, Oct [7] C. Lee, M. Potkonjak, and W. H. Mangone-Smth. MedaBench: A tool for evaluatng and syntheszng multmeda and communcatons systems. In Proceedngs of the 30th Annual Internatonal Symposum on Mcroarchtecture, pages , Research Trangle Park, N.C., Dec [8] A. Mshchenko and M. Perkowsk. Fast heurstc mnmzaton of exclusve-sums-of-product. In Proceedngs of the 5th Internatonal Reed-Muller Workshop, Los Angeles, Calf., Aug [9] V. G. Oklobdzja, D. Vlleger, and S. S. Lu. A method for speed optmzed partal product reducton and generaton of fast parallel multplers usng an algorthmc approach. IEEE Transactons on Computers, C-45(3): , Mar [10] A. R. Omond. Computer Arthmetc Systems. Prentce Hall, New York, [11] T. Sasao. An exact mnmzaton of AND-EXOR expressons usng BDD s. In Proccedngs of IFIP WG 10.5 Workshop on Applcatons of the Reed-Muller Expanson n Crcut Desgn, [12] E. Sentovch, K. Sngh, L. Lavagno, M. C., R. Murga, A. Saldanha, S. H., S. P., R. Brayton, and A. Sangovann Vncentell. SIS: A system for sequental crcut synthess. Techncal report, Unversty of Calforna, Berkeley, Calf., [13] P. Song and G. De Mchel. Crcut and archtecture trade-offs for hgh speed multplcaton. IEEE Journal of Sold-State Crcuts, 26(9), Sept [14] P. F. Stellng, C. U. Martel, V. G. Oklobdzja, and R. Rav. Optmal crcuts for parallel multplers. IEEE Transactons on Computers, C-47(3):273 85, Mar [15] Synopsys. Creatng Hgh-Speed Data-Path Components Applcaton Note, Aug Verson [16] J. Um and T. Km. An optmal allocaton of carry-save-adders n arthmetc crcuts. IEEE Transactons on Computers, C-50(3):215 33, Mar [17] A. K. Verma and P. Ienne. Improved use of the carry-save representaton for the synthess of complex arthmetc crcuts. In Proceedngs of the Internatonal Conference on Computer Aded Desgn, pages , San Jose, Calf., Nov [18] C. S. Wallace. A suggeston for a fast multpler. IEEE Transactons on Electronc Computers, C-13(2):14 17, Feb [19] C. Yang, M. J. Ceselsk, and V. Snghal. A BDD-based logc optmzaton system. In Proceedngs of the 37th Desgn Automaton Conference, pages 92 97, Los Angeles, Calf., June
TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION
1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:
More informationA New Design of Multiplier using Modified Booth Algorithm and Reversible Gate Logic
Internatonal Journal of Computer Applcatons Technology and Research A New Desgn of Multpler usng Modfed Booth Algorthm and Reversble Gate Logc K.Nagarjun Department of ECE Vardhaman College of Engneerng,
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationDecision Diagrams Derivatives
Decson Dagrams Dervatves Logc Crcuts Desgn Semnars WS2010/2011, Lecture 3 Ing. Petr Fšer, Ph.D. Department of Dgtal Desgn Faculty of Informaton Technology Czech Techncal Unversty n Prague Evropský socální
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationSpeeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem
H.K. Pathak et. al. / (IJCSE) Internatonal Journal on Computer Scence and Engneerng Speedng up Computaton of Scalar Multplcaton n Ellptc Curve Cryptosystem H. K. Pathak Manju Sangh S.o.S n Computer scence
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationNP-Completeness : Proofs
NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationDepartment of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification
Desgn Project Specfcaton Medan Flter Department of Electrcal & Electronc Engneeng Imperal College London E4.20 Dgtal IC Desgn Medan Flter Project Specfcaton A medan flter s used to remove nose from a sampled
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationA Novel, Low-Power Array Multiplier Architecture
A Noel, Low-Power Array Multpler Archtecture by Ronak Bajaj, Saransh Chhabra, Sreehar Veeramachanen, MB Srnas n 9th Internatonal Symposum on Communcaton and Informaton Technology 29 (ISCIT 29) Songdo -
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationClock-Gating and Its Application to Low Power Design of Sequential Circuits
Clock-Gatng and Its Applcaton to Low Power Desgn of Sequental Crcuts ng WU Department of Electrcal Engneerng-Systems, Unversty of Southern Calforna Los Angeles, CA 989, USA, Phone: (23)74-448 Massoud PEDRAM
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationCalculation of time complexity (3%)
Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationLINEAR TRANSFORMATION OF BINARY DECISION DIAGRAMS TROUGH SPECTRAL DOMAIN
LINEAR TRANSFORMATION OF BINARY DECISION DIAGRAMS TROUGH SPECTRAL DOMAIN Mlena Stankovc, Suzana Stokovc 2 Faculty of Electronc Engneerng, Unversty of Ns, A Medvedeva 4, 8 Ns, SERBIA, mstankovc@elfaknacyu,
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationFormulas for the Determinant
page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use
More informationRepresentations of Elementary Functions Using Binary Moment Diagrams
Representatons of Elementary Functons Usng Bnary Moment Dagrams Tsutomu Sasao Department of Computer Scence and Electroncs, Kyushu Insttute of Technology Izua 82-852, Japan Shnobu Nagayama Department of
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationFundamental loop-current method using virtual voltage sources technique for special cases
Fundamental loop-current method usng vrtual voltage sources technque for specal cases George E. Chatzaraks, 1 Marna D. Tortorel 1 and Anastasos D. Tzolas 1 Electrcal and Electroncs Engneerng Departments,
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationTHE SUMMATION NOTATION Ʃ
Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the
More informationOn the Repeating Group Finding Problem
The 9th Workshop on Combnatoral Mathematcs and Computaton Theory On the Repeatng Group Fndng Problem Bo-Ren Kung, Wen-Hsen Chen, R.C.T Lee Graduate Insttute of Informaton Technology and Management Takmng
More informationAnnexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances
ec Annexes Ths Annex frst llustrates a cycle-based move n the dynamc-block generaton tabu search. It then dsplays the characterstcs of the nstance sets, followed by detaled results of the parametercalbraton
More informationSingle-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition
Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationGraph Reconstruction by Permutations
Graph Reconstructon by Permutatons Perre Ille and Wllam Kocay* Insttut de Mathémathques de Lumny CNRS UMR 6206 163 avenue de Lumny, Case 907 13288 Marselle Cedex 9, France e-mal: lle@ml.unv-mrs.fr Computer
More informationExercises. 18 Algorithms
18 Algorthms Exercses 0.1. In each of the followng stuatons, ndcate whether f = O(g), or f = Ω(g), or both (n whch case f = Θ(g)). f(n) g(n) (a) n 100 n 200 (b) n 1/2 n 2/3 (c) 100n + log n n + (log n)
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationModule 2. Random Processes. Version 2 ECE IIT, Kharagpur
Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationMA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials
MA 323 Geometrc Modellng Course Notes: Day 13 Bezer Curves & Bernsten Polynomals Davd L. Fnn Over the past few days, we have looked at de Casteljau s algorthm for generatng a polynomal curve, and we have
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationMessage modification, neutral bits and boomerangs
Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental
More informationAffine transformations and convexity
Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationGeneral theory of fuzzy connectedness segmentations: reconciliation of two tracks of FC theory
General theory of fuzzy connectedness segmentatons: reconclaton of two tracks of FC theory Krzysztof Chrs Ceselsk Department of Mathematcs, West Vrgna Unversty and MIPG, Department of Radology, Unversty
More information20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.
20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationCHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION
CAPTER- INFORMATION MEASURE OF FUZZY MATRI AN FUZZY BINARY RELATION Introducton The basc concept of the fuzz matr theor s ver smple and can be appled to socal and natural stuatons A branch of fuzz matr
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationLecture 5 Decoding Binary BCH Codes
Lecture 5 Decodng Bnary BCH Codes In ths class, we wll ntroduce dfferent methods for decodng BCH codes 51 Decodng the [15, 7, 5] 2 -BCH Code Consder the [15, 7, 5] 2 -code C we ntroduced n the last lecture
More informationHigh Performance Rotation Architectures Based on the Radix-4 CORDIC Algorithm
IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 8, AUGUST 997 855 Hgh Performance Rotaton Archtectures Based on the Radx-4 CORDIC Algorthm Elsardo Antelo, Julo Vllalba, Javer D. Bruguera, and Emlo L. Zapata
More informationLow Complexity Soft-Input Soft-Output Hamming Decoder
Low Complexty Soft-Input Soft-Output Hammng Der Benjamn Müller, Martn Holters, Udo Zölzer Helmut Schmdt Unversty Unversty of the Federal Armed Forces Department of Sgnal Processng and Communcatons Holstenhofweg
More informationThe optimal delay of the second test is therefore approximately 210 hours earlier than =2.
THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple
More informationSome Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM
Example of Extended Eucldean Algorthm Recall that gcd(84, 33) = gcd(33, 18) = gcd(18, 15) = gcd(15, 3) = gcd(3, 0) = 3 We work backwards to wrte 3 as a lnear combnaton of 84 and 33: 3 = 18 15 [Now 3 s
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM
ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look
More informationTemperature. Chapter Heat Engine
Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the
More informationDesign and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm
Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:
More informationLecture 2: Gram-Schmidt Vectors and the LLL Algorithm
NYU, Fall 2016 Lattces Mn Course Lecture 2: Gram-Schmdt Vectors and the LLL Algorthm Lecturer: Noah Stephens-Davdowtz 2.1 The Shortest Vector Problem In our last lecture, we consdered short solutons to
More informationThe L(2, 1)-Labeling on -Product of Graphs
Annals of Pure and Appled Mathematcs Vol 0, No, 05, 9-39 ISSN: 79-087X (P, 79-0888(onlne Publshed on 7 Aprl 05 wwwresearchmathscorg Annals of The L(, -Labelng on -Product of Graphs P Pradhan and Kamesh
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationSuppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl
RECURSIVE SPLINE INTERPOLATION METHOD FOR REAL TIME ENGINE CONTROL APPLICATIONS A. Stotsky Volvo Car Corporaton Engne Desgn and Development Dept. 97542, HA1N, SE- 405 31 Gothenburg Sweden. Emal: astotsky@volvocars.com
More informationLecture 3 Stat102, Spring 2007
Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture
More informationThe Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL
The Synchronous 8th-Order Dfferental Attack on 12 Rounds of the Block Cpher HyRAL Yasutaka Igarash, Sej Fukushma, and Tomohro Hachno Kagoshma Unversty, Kagoshma, Japan Emal: {garash, fukushma, hachno}@eee.kagoshma-u.ac.jp
More informationExample: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,
The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no confuson
More informationMEM 255 Introduction to Control Systems Review: Basics of Linear Algebra
MEM 255 Introducton to Control Systems Revew: Bascs of Lnear Algebra Harry G. Kwatny Department of Mechancal Engneerng & Mechancs Drexel Unversty Outlne Vectors Matrces MATLAB Advanced Topcs Vectors A
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More informationA quote of the week (or camel of the week): There is no expedience to which a man will not go to avoid the labor of thinking. Thomas A.
A quote of the week (or camel of the week): here s no expedence to whch a man wll not go to avod the labor of thnkng. homas A. Edson Hess law. Algorthm S Select a reacton, possbly contanng specfc compounds
More informationSplit alignment. Martin C. Frith April 13, 2012
Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationUncertainty in measurements of power and energy on power networks
Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:
More information8.6 The Complex Number System
8.6 The Complex Number System Earler n the chapter, we mentoned that we cannot have a negatve under a square root, snce the square of any postve or negatve number s always postve. In ths secton we want
More informationOn the Multicriteria Integer Network Flow Problem
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of
More informationAn Interactive Optimisation Tool for Allocation Problems
An Interactve Optmsaton ool for Allocaton Problems Fredr Bonäs, Joam Westerlund and apo Westerlund Process Desgn Laboratory, Faculty of echnology, Åbo Aadem Unversty, uru 20500, Fnland hs paper presents
More informationFaster Searching by Elimination
Faster Searchng by Elmnaton Theodore S. Norvell Electrcal and Computer Engneerng Memoral Unversty December 6, 010 Abstract The SIMPLE system, under development at Memoral Unversty, allows abstract problem
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationTornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003
Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003 Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationfind (x): given element x, return the canonical element of the set containing x;
COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationVariability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning
Asa and South Pacfc Desgn Automaton Conference 2008 Varablty-Drven Module Selecton wth Jont Desgn Tme Optmzaton and Post-Slcon Tunng Feng Wang, Xaoxa Wu, Yuan Xe The Pennsylvana State Unversty Department
More informationBeyond Zudilin s Conjectured q-analog of Schmidt s problem
Beyond Zudln s Conectured q-analog of Schmdt s problem Thotsaporn Ae Thanatpanonda thotsaporn@gmalcom Mathematcs Subect Classfcaton: 11B65 33B99 Abstract Usng the methodology of (rgorous expermental mathematcs
More informationResearch Article Green s Theorem for Sign Data
Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationSimultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals
Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More information