NEW MEANS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS

Cybernetis and Systems Analysis, Vol. 43, No. 5, 007 NEW MEANS OF CYBERNETICS, INFORMATICS, COMPUTER ENGINEERING, AND SYSTEMS ANALYSIS ARCHITECTURAL OPTIMIZATION OF A DIGITAL OPTICAL MULTIPLIER A. V. Anisimov and I. A. Zavadskyi UDC 681.35.57 A omputational model of estimation of the time omplexity of logial iruits onstruted from elements of an optial element base is investigated. A fast parallel multiplier is onstruted. Keywords: optial ith, ithing element, multiplier, logial iruit, synhronous arithmeti. In the past deade, the searh for new physial priniples that ould underlie future proessors is ativated in designing high-speed parallel omputers. Inreasing attention is being given to light as an information arrier. On the one hand, the development of nanophotoni tehnologies in the near future will make it possible to ontrol individual photons as bits in a quantum omputer. On the other hand, the ontrol of ultrashort laser pulses makes it possible to perform lassial omputations at tens of gigahertz. A hange in the basi physis of omputers requires new approahes to the modeling of their funtioning. In Se. 1 of this artile, a model of omputations is investigated that allows one to estimate the speed of logial iruits with allowane made for the distintive features of proessing optial signals. In Se., this model is used in designing a parallel optial multiplier and, in Se. 3, its time and spae omplexity are estimated and ompared with well-known types of parallel multipliers, in partiular, with the devie proposed in [1]. A reason for investigations in this diretion beame the results of urrent works oriented toward the reation and perfetion of miniature optial ithes in many researh enters of Japan, Western Europe, and the USA. Several ompanies produe industrial models of suh devies. Their harateristis allow one to reate networks onsisting of several ithes or several tens of sequentially onneted ithes without any additional equipment suh as optial amplifiers. This is amply suffiient to widely use optial ithes in teleommuniations but insuffiient for onstrution of high-speed logial iruits and low-power omputers. Nevertheless, the reation of optial ithes that an be used as an element base of universal proessors is the objetive of several international programs. Taking into aount the high rate of improving the harateristis of optial ithing elements during the past 5 7 years, the reation of an effiient digital optial proessor an be onsidered as a problem that an be solved in the nearest deade. It may be noted that the most promising basi elements of optial logial iruits are Fabry Perot miroresonators and Mah-Zehnder ithes in photon rystals. A detailed desription of the priniple of ation and harateristis of these devies an be found in [] and [3], respetively, and a brief review of them is given in [1]. 1. MODELING OF OPTICAL COMPUTERS In ontrast to the majority of investigations in the field of optial omputations, the authors fous their attention not on the optimization of physial parameters of an element base but on the onstrution of logial iruits in whih the distintive features of optial ithes are used with maximal effiieny. We onsider erroneous the widespread approah to the reation of optial logial iruits that onsists of the onstrution of ithes from some olletion of basi gates, for example, disjuntions, onjuntions, negations, one-digit adders, et. with suessive replaement of the orresponding elements in well-known iruits onstruted from the traditional transistor-resistor element base by these gates. The Taras Shevhenko University, Kiev, Ukraine, mi@uniyb.kiev.ua; zava@ukr.net. Translated from Kibernetika i Sistemnyi Analiz, No. 5, pp. 165 177, September Otober 007. Original artile submitted May 14, 007. 1060-0396/07/4305-0749 007 Springer Siene+Business Media, In. 749

Information output Information output 1 Information output Control input Control input Information input 1 Information input a Information input b Fig. 1. Swithing element: (a) with two information inputs and one output; (b) with one information input and two outputs. Fig.. A single-lok ith iruit. Fig. 3. Swithing iruit whose time omplexity depends on the value of t / t. trans ineffiieny of this approah is onditioned by the fat that the model of omputations that is used for the estimation of the omplexity of traditional logial iruits does not allow one to adequately estimate time harateristis of optial iruits. To prove this thesis, let us onsider some key distintive features of digital optial omputations. All the signals in omputing iruits onstruted from eletroni logial elements possess equal rights but, in optial ithes, one must distinguish between two types of signals, namely, ontrol and information ones. These signals are usually of different nature, for example, a ontrol signal an be eletri and an information signal an be optial or both signals an be light fluxes with different wavelengths. The interation between ontrol and information signals is underlain by a definite physial proess, for example, the pumping or relaxation of a resonator. The speed of running of this proess and also the speed of transformation of an information signal into a ontrol one determine the ith time of an optial element (we denote it by t ). The time of transfer of an information signal through an elementary optial devie (t trans ) is almost the same as the time of its transmission along a fragment of a waveguide that has the same length as the ith itself. The value of t trans must also take into aount the time of passage of the minimally possible distane between ithes by the information signal. For the majority of modern optial ithes, the ratio t / ttrans is within the range from several unities to several hundreds. Thus, there is a potential possibility to aelerate omputations by onstruting logial iruits so that a minimal possible number of ontrol signals are sequentially omputed. An abstration of a ith is a ithing element (SE). It has no more than two information inputs and outputs and also one ontrol input. Depending on the value arriving at the ontrol input, signals from some information inputs are transmitted to definite information outputs. In [1] and [4], SEs with two information inputs and one information output (Fig. 1a) were onsidered. For the unit ontrol signal, the information output is onneted with the input denoted by a single line and, for the zero one, it is onneted with the input denoted by a double line. In this ase, the signal at the information input that is not onneted with the output is lost. Suh losses an be avoided with the help of an SE with one information input and two outputs (Fig. 1b). For the zero ontrol signal, the input is onneted with the output denoted by a double line and, for the unit signal, it is onneted with the output denoted by a single line. In this ase, the signal is absent at the output that is not onneted to the input, i.e., this output is onsidered to be zero. In what follows, we will use preisely suh SEs that do not lead to losses of information signals and, hene, allow one to onstrut iruits with onsiderably low-power onsumption. It is preisely one SE with one information input and two outputs that models a Mah Zehnder ith. 750

Fig. 4. Y-onnetion of waveguides that realizes the operation OR in a photon rystal. If we onnet information inputs of some SEs to information outputs of others SEs, then a iruit will be onstruted in whih all the SEs are simultaneously ithed, i.e., a single-lok ith iruit. The total operation time of suh a iruit equals t Lttrans, where L is the number of SEs in the longest path onneting an arbitrary input with an arbitrary output. For example, the operation time of the iruit presented in Fig. equals t ttrans. This priniple of determination of time omplexity is easily generalized to the ase when a omputer onsists of several single-lok iruits onneted so that any of them an begin its operation only after the ompletion of operation of all the iruits on whih it depends. Devies of this type that are designed for the exeution of most important arithmeti operations are onsidered in [1]. In many ases, their time omplexity determined as above is essentially less than the time omplexity of devies performing the same operations with the highest speed from the viewpoint of the lassial approah aording to whih the operation time of a iruit is determined by the largest number of elements along the path from its inputs to outputs. However, the priniple desribed above does not allow one to adequately estimate time omplexities of ithing iruits in the general ase sine, depending on the value of t / ttrans, the operation time of the same iruits an be determined by different fators. For example, let us onsider the iruit presented in Fig. 3. If we assume that signals simultaneously arrive at all the inputs of the iruit at the moment of time 0, then SE ithes at the moment of time t ttrans and its information input is omputed at the moment t 4 ttrans. Hene, if we have t 3 ttrans, then the operation time of the iruit equals t ttrans, and if we have ttrans t 3 ttrans, then its time omplexity amounts to t 5 ttrans. Let us onsider the formula that speifies the operation time of ithing iruits and allows one to solve the mentioned problems. As is easy to see, in the general ase, the operation time of a iruit s is determined by the value of max { ( ) t l ( ) t }. (1) Vs Here, V s is the set of all the paths from the inputs of the iruit s to its outputs, is some path, i.e., a sequene of adjaent SEs that an be onneted by information and ontrol signals, ( ) is the number of ontrol signals along the path, and l ( ) is the total number of SEs along the path. As a rule, the better the modeling means being used, the more perfet the devies onstruted with their help. In partiular, striving to minimize the time omplexity omputed by formula (1), we will onstrut (in Se. ) a parallel optial multiplier whose size is onsiderably smaller and arhiteture is simpler than those of the iruit of multipliation of multidigit numbers from [1]. Before desribing the struture of the multiplier, we will pay attention to one more distintive feature of digital optial omputations. In addition to ative elements suh as ithes, logial iruits also ontain passive elements that do not hange their states as a result of definite physial proesses. They inlude optial waveguides, waveguide bends, waveguide branhings, et. In our ase, it is espeially important that optial onnetors realizing the logial funtion OR are also passive elements. In partiular, in [5], the Y-shaped onnetion of waveguides in a photon rystal (Fig. 4) is onsidered and it is shown that, when a defet of speial form is reated at the juntion point of the waveguides, the optial energy is transmitted in the diretions a1 b or a b rather than in the diretions a1 a or a a1. Other passive devies for onnetion of waveguides, for example, a diretional oupler [6, 7], are also widely used. Thus, the exeution of the operation OR in optial alulators does not mean that a physial proess runs that oupies some time; it is performed as a result of a speial onnetion of hannels along whih optial signals are transmitted. This distintive feature is used in Se., and the logial OR is denoted by a filled triangle and is not taken into aount in determining the time omplexity of iruits. It is worth noting that the multiplying iruit onsidered in Se. an also be realized without OR onnetives if we use SEs with two information inputs and one output. trans 751

Fig. 5. Determination of the parity of the number of unities in a olumn and the arry to its adjaent olumn. Fig. 6. Ciruit for translation of multirow odes into one-row ones.. OPTICAL MULTIPLIER The iruit that is desribed below and realizes the multipliation operation is the iruit proposed in [1] and improved in a definite sense. Both these iruits belong to the lass of multilevel matrix multipliers. A matrix of partial produts is applied to the input of the first level of this multiplier. At the output of the ith level, we obtain ki ki 1 numbers whose sum is equal to the sought-for produt. In partiular, at the output of the last level, we obtain two numbers that should be added together. The differene between multilevel matrix multipliers onsist of the method of translating the sum of k i 1 numbers (some k i 1 -ode) into the sum of k i numbers (a k i -row ode). In partiular, in [1], the matrix of addends is partitioned into ells and the iruit that is desribed in detail in [4] and is designed for the transformation of multirow odes into one-row ones is applied to eah of these ells (hereafter, we all the multiplier from [1] ellular). An important advantage of the iruit from [4] is its two-lok ith time, but it also has two essential drawbaks. First, the iruit from [4] is rather long and if it is applied to a ell of size a b, then the length of the path over whih an information signal is transmitted is proportional to the value of ab. To derease the length of the iruit from [4] in a ellular multiplier, it is applied not to the entire matrix of partial produts but to its fragments. Seond, the size of the iruit from [4] leaves something to be desired sine the result of summation of ab-bit numbers is proportional to the value of a b. We will desribe a iruit that makes it possible to perform a similar operation with the help of Oab ( )SEs. It is the parallel appliation of suh iruits to fragments of the matrix of addends that forms the transformation performed at eah or some levels of the multiplier desribed in this work..1. A Ciruit Translating a Multirow Code into a One-Row One. In onstruting the iruit desribed below, as well as in designing the iruit from [4], we proeeded from the following fat: the ith bit of the sum is determined by the parity of the number of unities in the ith olumn of the matrix of addends with allowane made for all the arries arrived from the right bits. However, the method of determination of the parity of the number of unities in a olumn and the method of their transfer to the left olumns are different in the mentioned iruits. In the iruit from [4], for alulation of the number of unities, a one-yle deoder translating an arbitrary binary vetor into a vetor of the form 0... 01... 1 with the same number of unities was used. Then the jth output of the deoder was used as the jth information input of the deoder that proesses the next olumn. Thus, a arry was realized, i.e., instead of eah pair of unities, one unity was written in the adjaent olumn to the left. The deoder has quadrati spae omplexity with respet to the length of the vetor being deoded and the bits of the vetor are its ontrol signals. Sine deoders are onneted only by information inputs/outputs, the entire iruit from [4] is ithed during one lok yle. The parity of the number of unities in a vetor an also be found with the help of a linear-size iruit, and we will replae the deoder by suh a sheme. The same iruit will realize the transfer of pairs of arries to the adjaent olumn to the left. It may be noted that, in a linear iruit, information hannels in whih pairs of unities would be aumulated are 75

absent. They are also absent in a similar iruit that proesses the olumn to the left and, hene, a unique method of introdution of arries into a iruit that proesses the olumn to the left is their equating with some of its ontrol inputs. The struture of a iruit that proesses a olumn and also the tehnique of onnetion of iruits orresponding to adjaent olumns are shown in Fig. 5. Let us onsider the priniple of operation of the iruit presented in Fig. 5. Eah pair of SEs loated at one horizontal level is ontrolled by one signal. We note that the bits 01 or 10 arrive at the inputs of any pair of SEs ontrolled by the ijth signal x ij, and the hoie of a onrete pair of bits is determined by the parity of the number of unities in the vetor x1j,..., xi 1, j. This an be easily proved by indution. Let us onsider the SEs that are denoted by the numbers 1 4 in 0 0 1 1 Fig. 5. We denote their inputs by In 1 In 4, their 0-outputs by Out Out, and their 1-outputs by Out Out. If we have 1 4 1 4 x 11 0, then we obtain In 1 0 and In 1, but if we have x 11 1, then, vie versa, we obtain In 1 1and In 0. We now assume that SEs 1 and form an arbitrary pair of SEs in the iruit that proesses the first olumn. If we have x 1 1, In 1 0, and In 1, then we obtain Out 0 1 Out 1 0 1 0 1 1 0 1 Out 0, Out 1, In 3 Out Out 1, and In Out 1 4 Out 1 0.Itis also obvious that we have x 1 1, In 1 1, In 0 In3 0, In 4 1; x 1 0, In 1 1, In 0 In 3 1, In 4 0 ; x 1 0, In 1 0, In 1 In3 0, and In 4 1. Thus, if we have x ij 0, then the same signals arrive at the inputs of the i 1th pairs of SEs as at the inputs of the ith pair, and if we have x ij 1, then the pair of signals 01 is replaed by 10 and vie versa. This implies that if the vetor x1j,..., xi 1, j ontains an odd number of unities, then 10 arrives at the inputs of the pair of SEs ontrolled by the ijth signal x ij and 01 arrives otherwise, i.e., the zero output of the last left element onneted by a onnetive OR with the unit output of the last right element is equal to the oddness of the number of unities in the orresponding olumn of the matrix of addends. Let us onsider the proess of transfer of a arry to the left olumn. Again, we assume that SEs 1 and in Fig. 5 form an arbitrary pair of SEs that is ontrolled by the ijth signal x ij. A arry signal must be generated in the ase when x ij forms the next pair of unities in the vetor x1 j,..., xij, i.e., if we have In 1 1and x ij 1. In order that this signal influene the parity of the number of unities in the olumn to the left, it should be transformed into a ontrol signal for the pair of SEs in the iruit that proesses this olumn (see Fig. 5). It may be noted that arry signals annot be generated by two adjaent bits of the vetor x1 j,..., xij and, hene, the arry generated by SE 1 is onneted by a onnetive OR with the arry generated by SE 3. We will also pay attention to the fat that the arry signals from the right olumn are proessed only after proessing all the bits of the left olumn. Though, from a logial viewpoint, arry signals an be proessed at any moment, for example, before the proessing the bits of a olumn or alternately with suh a proessing, it is the final proessing of them and preisely in the order in whih they are generated by the right olumn makes it possible to optimize the time harateristis of the iruit, whih will be shown below. The tehnique of onnetion of iruits proessing several olumns of a matrix of addends is represented in Fig. 6. We all the iruit obtained as a result of this onnetion basi. Here, S1, S, and S 3 are iruit fragments proessing the bits of the matrix of addends, P P5 are fragments proessing arry bits, and W denotes a single-lok iruit omputing the weight of the vetor obtained at the output of the iruit P 5. The struture of the iruit W is presented in Fig. 7. In this iruit, SEs form a deoder with a vetor of the form ( k,..., 1 ) ( 0... 010... 0) at its output. In this vetor, unity is loated at a position w if the outputs of the last iruit P j ontain w unities. If the outputs of the last iruit P j ontain no unity, than we obtain the zero vetor at the output of the deoder. Given a vetor ( k,..., 1 ), we an easily determine the number w wm,, w 0 with the help of a multibit OR iruits w 0 1 3 5, w 1 3 6 7, and w 1 4 5 6 7 1 13 14 15. These iruits are realized with the help of two-digit OR iruits onneted in the form of a treelike or linear struture and do not require any time for ithing. We denote by SP i the entire iruit that proesses the ith olumn and onsists of fragments S i and P i. All these iruits are of the form that is presented in Fig. 5 and that orresponds to the retangle enlosed by a dotted line in Fig. 6. Note that the proessing of arries from the leftmost olumn, in addition to the fragments P i loated above the fragments S i, requires some additional iruits P j ; their number does not exeed the number of fragments S i and its exat value will be determined below. We will also show that fragments P i at most double the size of the iruits that proess olumns. This means that the addition of ab-bit numbers requires Oab ( ) SEs. In ontrast to the iruit from [4] that ithes in two lok yles, the ith time of the iruit being onsidered exeeds the number of olumns, whih is the payment for the derease in the size, but the loss an be redued to zero by a variation in lengths of olumns. The idea onsists of providing the ondition under whih an information signal does not wait for the orresponding ontrol signal but, during the delay onditioned by ithing, passes along the hain of SEs for whih 753

ontrol signals are already omputed. In other words, the lengths of iruits S i should be seleted so that they all simultaneously ome into operation and, at the same time, the information signal should arrive to the last pair of SEs in a fragment P i exatly at the moment of their ithing. Let L be the number of SEs in iruits S i or P i through whih the information signal passes during ithing one SE. L approximately equals ttrans / t if t trans and t take into aount the time of transformation of an information signal into a ontrol one, time of passage of a signal along the branhes of optial waveguides between ithes, and other overheads. LS i, LPi, and LSP i denote the lengths of fragments Si, Pi, and the entire iruit of proessing the ith olumn, respetively, i.e., we have LSPi LS i LPi. Let us show that, to meet the above onditions of absene of idle time in the iruit, it is neessary and suffiient that the following equality be true when i 1: LS i ( LS1 il)/. () By the onstrution of the basi iruit, the last SE in eah olumn is ithed after the time t after the ithing of the last SE in the previous olumn. During this time, the information signal IS in eah iruit SP i in whih it has not yet reahed its end has time to pass through L SEs. Moreover, the IS in eah olumn is propagated during the same time as the IS in the first olumn, i.e., any IS passes through LS 1 SEs even before the ithing of the last SE in the first olumn. Thus, we have the relationship LSP LS ( i ) L. (3) i 1 1 On the other hand, taking into aount that to eah two ontrol signals in a iruit SP i 1 orresponds one ontrol signal in the fragment P i, we obtain the relationship LP i LSP 1 /. (4) Relationships (3) and (4) imply equality () sine we have LS i LSPi LPi LS1 ( i 1) L LSP i 1 / LS 1 ( i 1) L (LS 1 ( i ) L )/ ( LS1 il )/. In partiular, as is obvious from relations () (4), we have LPi LS i and, hene, the iruit size linearly depends on the number of bits in the matrix of addends. Assuming that a iruit ontains b olumns S i that proess n bits in all and olumns P j that do not belong to SP i and that its rightmost olumn proesses a LS1 bits, we determine the relationship between the parameters ab,,, and n. The parameter b of the basi iruit is expressed through the parameter a and the iruit length n, namely, b is the largest integer that satisfies inequality (5) that follows from relationship (), 754 Fig. 7. The iruit W that proesses high-order bits. b i 1 a ( a il ) n. (5) i

The parameter a should be seleted so that, for the integer b orresponding to it, relationship (5) is maximally lose to equality. This makes it possible to avoid time losses onneted with the delay of the beginning of propagation ISs through the iruit S b if its length is too small. Among all the values of a that satisfy this ondition, the smallest value should be hosen. This beomes obvious if we note that the operation time of the entire asade of iruits SP i equals t translspb t, i.e., it dereases with dereasing LSP b. To a smaller a orresponds a larger b and, hene, a smaller length of the iruit SP b. We will also investigate the following question: what is the expedient number of iruits P j that do not belong to SP j, i.e., estimate the value of the parameter. These iruits annot begin to simultaneously operate, and sine the length of eah next iruit P j is half the length of the previous one, P j must begin to operate at the moment that allows it to omplete its operation after the time t after the ompletion of operation of P j 1. Therefore, the omputation of all the bits of the sum with the help of the asade of iruits P j is ineffiient from the viewpoint of time omplexity. Instead, when the length of the vetor of arries beomes suffiiently small, we will use one iruit W after the asade onsisting of iruits P j for omputation of the remained bits of the sum. In determining the value of, we proeed from the fat that the size of the iruit W should not distort the general estimate of spae omplexity. This size amounts to ( LP / ) / LP / 8SEs. But if these LP bits will also be further proessed by iruits P j, then we will need ( LP / LP / 4... 1) LP SEs. Thus, as soon as the inequality LP / 8 LP is fulfilled that is equivalent to inequality (6) sine LP is an integer, the use of iruits P, P,... loses any meaning at all and their asade should be replaed by one iruit W, 1 LP 15. (6) The use of the iruit W in the ase when we have LP 15 will redue the operation time of the basi iruit owing to the inrease in its size. We estimate the size and total operation time of the basi iruit. If we assume that the iruit W is absent and that all arries are proessed by a asade of iruits P j, then a suffiiently exat upper bound of the iruit size an be easily obtained. To eah of n inputs of the iruit orresponds two SEs. Moreover, eah pair of bits generates its arry bit, eah pair of arry bits generates one more suh a bit, et., i.e., no more than ( n/ n / 4... 1) n arry bits are generated in the aggregate and eah of them is proessed by two SEs. In total, we obtain that the iruit size is no more than 4n SEs. When LP 15, this estimate an be larger by a fator of at most LP / 8 LP. But if we have LP 15, then the mentioned value will be negative and it also should be added to 4n. Sine we have LP LSPb / ( a ( b 1) L )/, we obtain the following formula for the upper estimate of the basi iruit size: 4n LP / 8 LP, where LP ( a ( b 1) L )/. (7) The omputation of the total operation time of the basi iruit is suffiiently easy, namely, all the SEs in the iruits S 1 S b are first ithed in parallel, then the IS is propagated through these iruits and, when it will attain the end of the iruit S 1, the last SEs of iruits P 1 P b are sequentially ithed without delays and the last SEs in eah olumn of the iruit W are ithed after them. The time of transfer of the information signal through the iruits SP SP b and P 1 P an be negleted sine the IS is transferred during ithings, but one should take into aount the time of transfer of the IS through the iruit W whose length is equal to ( a ( b 1) L )/ 1. As a result, we obtain the following formula for the determination of the total operation time of the basi iruit: ( b 1) t ( a ( a ( b 1) L )/ ) t 1 trans. (8) Table 1 presents the harateristis of the basi iruit onstruted from SEs realized with the help of Fabry Perot miroresonators (t 8ps, t trans 0033. ps, and L 40) for different values of n that were hosen so that the parameter a was equal to 0. In this ase, the value of b was determined as the largest integer satisfying inequality (5) and, hene, the values of n presented in the table are the least n for given values of b. After fixing the values of ab,, and n, the parameter varied from the largest value satisfying inequality (6) to a value that was less by 1 and thereby determined various ratios between time and spae omplexities. For iruits realized with the help of another element base, similar alulations an be performed... Multiplying iruit. The matries of addends that are loated at eah level of a multilevel matrix multiplier should be ompletely overed by the inputs of the iruits that translate multirow odes into one-row ones. Note that, though the input olumns of eah of suh iruits an be arbitrarily formed from bits of the orresponding olumns of the matrix of 755

TABLE 1 n a, b, Size, SEs Additional size, SEs Operation time of the iruit, ps 10 0; ; 4 0; ; 3 480 534 0 54 56 48 360 70 100 1800 0; 3; 5 0; 3; 4 0; 3; 3 0; 4; 6 0; 4; 5 0; 4; 4 0; 5; 6 0; 5; 5 0; 5; 4 0; 6; 7 0; 6; 6 0; 6; 5 1440 1494 177 875 898 3045 4800 4854 513 7194 706 799 0 54 33 5 18 165 0 54 33 6 6 99 7 64 57 88 80 73 96 88 81 11 104 97 Fig. 8. Configuration of inputs of iruits that translate multirow odes into one-row ones. addends, from the viewpoint of the derease in the length of interelement onnetions, the arrangement of inputs in the form of a stairase suh as that presented in Fig. 6 is most onvenient. In this ase, the entire matrix of partial produts at the first level of the multiplier an be overed by the inputs of the iruits of translation of multirow odes as is shown in Fig. 8, where i denotes the inputs of the ith iruit. As a rule, this matrix is represented in the form of a parallelogram but it an also be represented in the form of a triangle by arranging the ontats in the left part of the parallelogram in a different way. Sine the size of the basi iruit linearly depends on the number of bits proessed by it, a partitioning of the matrix of addends into horizontal strips eah of whih is proessed by a olletion of basi iruits will not derease the total amount. Calulations have shown that suh a partitioning does not essentially redue the operation time of eah basi iruit; therefore, we will assume that the inputs of these iruits over the largest possible number of bits from the upper end of the matrix to its lower end. Emptinesses remain at the angles of the triangle of the matrix of addends, but they an be eliminated by a proper seletion of the value for the parameter a of the orresponding basi iruits and also by replaing ondition () by the inequality LS i ( LS1 il)/. The unique requirement imposed on the iruits that over bits at angles is that their operation time must not exeed the operation time of the largest basi iruit denoted by the number in Fig. 8. At the first level of the multiplier of n-bit numbers, the largest basi iruit will have n inputs. Its operation time is speified by formula (8) and is equal to the time of proessing the first level. As is easily seen, the word length of the number into whih n input bits are translated by the largest basi iruit is more than the word length of the number LSP b by b 1, i.e., amounts to b 1 log LSPb 1 b log( a ( b 1) L) bits. As is obvious, this value an be the upper bound of the number of rows of the ode into whih an n-row matrix of partial produts is translated by the iruit being onsidered. Computations showed that, for n within 10 thousand and for the values of L 4 (this inequality is fulfilled for all the onsidered types of element bases), this value does not exeed several tens and it is preisely the number of addends arriving at the seond level of the multiplier. They an be proessed with the help of the iruit onsidered in this work and with the help of the iruit from [1]. Against the bakground of the total spae omplexity, the size of the iruit from [1] applied to suh a small amount of addends will not be onsiderably larger than the size of the iruit that is desribed in this work and realizes omputations at the seond level of the multiplier. However, the iruit from [1] an turn out to be faster and, hene, it makes sense to onstrut a ombined multiplier that performs omputations aording to the sheme desribed in this work only at the first or at the first and seond levels whose outputs beome inputs of the iruit from [1]. Let us estimate the size of the iruit loated at the first level of the multiplier. Sine the lengths of iruits that translate a multirow ode into a one-row one are different and the values of the parameters ab,, and also vary, the use of formula (7) for estimation of the total size of the first level of the multiplier seems to be diffiult. A suffiiently exat upper bound of this size an be obtained using the following approah similar to that used in estimating the size of the basi iruit: to eah bit of the matrix of partial produts orresponds two SEs, and the total number of suh bits during multiplying two n-bit numbers will be equal to n, whih yields n SEs. Moreover, eah pair of bits generates its arry bit, eah pair of arry bits generates one more arry bit, et., i.e., the total number of produed bits will be no larger than ( n / n / 4... 1) n arry bits, eah bit being proessed by two SEs. In total, we obtain that the size of the iruit is no larger than 4n. We note that the size of the two-level multiplier from [1] is asymptotially larger sine it is equal to 756

5/ On ( ) SEs, and the multiplier onsidered in this artile in the ase when it is onstruted from Fabry Perot miroresonators and when the values of n amount to 10 thousand will also be two-level. Let us make some omments on the value of the parameter in basi iruits. As has been noted in item.1, it would make no sense to assign a value larger than the least number for whih the inequality LP 15 is fulfilled to this parameter. If this inequality is true, then the value of 4n remains the upper bound of the iruit size but, for iruits of larger length, i.e., loated more losely to the enter of the matrix of addends, it makes sense to derease the value of this parameter sine it is preisely these iruits that determine the time omplexity of a level, and the operation time of the basi iruit dereases with dereasing the value of. Therefore, designing basi iruits for onrete values of n and L in the ases when this will essentially improve the time omplexity of the largest basi iruit, we will hose whose value is smaller by 1 3 than the least number satisfying the inequality LP 15. In these ases, the time omplexity of several basi iruits loated at the enter of the matrix of addends should be redued to the time omplexity of the largest basi iruit by seletion of their parameters. This approah leads to an exess of the spae omplexity 4n that should be taken into aount. We assume that the largest basi iruit inreases its size by owing to an additional derease in the parameter. Then we onsider that all the basi iruits in whih the value of the parameter b is equal to the value of this parameter in the largest basi iruit also inrease their sizes by. In the iruits whose parameter b is smaller by unity, the parameter an be loser by one to the value satisfying the inequality LP 15 than the parameter in the largest basi iruit. We assume that all suh iruits also inrease their sizes by the same value that, as well as the number of iruits, an be determined from Table 1 or from a similar table for another element base. For example, if we have n 400 and iruits onstruted form Fabry Perot miroresonators are onsidered, then, for the largest basi iruit, we have a 0, b 3, and the least value of satisfying the inequality LP 15 equals 5. If we put 3, then we have LP 6, the exess of size of the largest basi iruit amounts to LP / 8 LP 359SEs, and, taking into aount the ontents of Table 1, the total exess of size an be omputed as follows: ( 400 360) 359 ( 360 10) 54 56640 SEs. The desribed approah allows one to obtain the upper bound of the spae omplexity of the iruit. 3. COMPARATIVE ANALYSIS OF TIME AND SPACE COMPLEXITY OF MULTIPLIERS Table presents time and spae harateristis of different iruits that translate multirow odes into two-row odes for SEs onstruted from Fabry Perot miroresonators. Using the methods desribed above, the parameters a and b of all basi iruits are uniquely determined from the word length n of multipliers. The unique parameter that an be varied and that determines the parameters of other basi iruits is the parameter of the largest basi iruit. Let us selet reasonable values of this parameter that make it possible to derease the operation time of the largest basi iruit by 10 0% with inreasing the iruit size by the same value. A further derease in will progressively derease the operation time with inreasing the iruit size. When n 104, after using the iruit desribed in this work at the first level, the number of addends at the seond level will not exeed 1. The iruit from [1] is most effiient for addition of this number of n-bit numbers and transforms the matrix of addends into a two-row ode at one level, i.e., the entire iruit will be two-level. In addition to the harateristis of the iruit onsidered in Se., for eah n, we present the harateristis of a time-optimal iruit [1] and also an asymptotially fastest lassial multilevel multiplier [8] onstruted from the elements of the orresponding optial element base. At eah level of this multiplier, a three-row ode is translated into a two-row ode. Swithing iruits that realize this transformation are desribed in [1]. We note that, inreasing the number of levels and operation time of the iruit from [1], one an redue its size. With equating the time omplexities of the iruit from [1] and the iruit proposed in this work, the latter will have a size advantage only about 0 30%, depending on n. However, a smaller number of levels of the multiplier is per se an essential advantage. Of importane is also the fat that the onsidered iruit of translation of multirow odes is flat and, taking into aount subiruits for proessing arries, it an be plaed on two planes parallel to the matrix of partial produts (if On ( ) subiruits of W are not taken into aount). At the same time, a natural onstrution for the two-level multiplier from [1] 3 / onsists of the plaement of On ( ) subiruits from [4] perpendiularly to the matrix of addends. It is also neessary to note that, in the ase when optial-optial Mah Zehnder ithes in photon rystals are used in the apaity of the element base, the iruit desribed above has sarely any size advantage over the multiplier from [1] and 757

TABLE n Ciruit Desribed in This Paper Operation time, Size, SEs ps Cellular Multiplier [1] Classial Multiplier Construted from Optial Swithes Operation time, Size, SEs ps Operation time, ps Size, SEs 64 51 16400 9 133100 89 54600 18 58 84100 36 156700 97 19400 56 69 310600 40 918500 113 881300 51 77 1361000 43 4596000 130 353000 104 87 5051000 50 5436000 146 14139000 its operation time is less by 10 40% when the word length of multipliers is within 1000. This is explained by the smallness of the value of t / ttrans for a Mah Zehnder ith, this smallness implies a small ell size in the multiplier from [1], whereas the iruit onsidered above redues the spae omplexity arising in this multiplier during proessing larger ells. REFERENCES 1. A. V. Anisimov and I. A. Zavadskyi, Synhronous optial multipliers, Cybernetis and Systems Analysis, No. 4, 10 116 (006).. C. Angulo Barrios, V. R. Almeida, R. R., Panepui, et al., Compat silion tunable Fabry Perot resonator with low power onsumption, IEEE Photonis Tehnology Letters, 16, No., 506 508 (004). 3. K. Asakawa, Y. Sugimoto, and Y. Watanabe, Photoni rystal and quantum dot tehnologies for all-optial ith and logi devie, New Journal of Physis, 8 (08) (006), http://ej.iop.org/links/r9p_wqx0w/_bybil78xgba Ynnav5vpA/njp6_9_08.pdf. 4. I. Zavadskyi, Multipliation using ithing elements, Visnyk Kyivskogo Univ., Ser. Fiz.-Mat. Nauk, No. 4, 145 156 (1999). 5. Topology optimization of asymmetri Y-juntion for air-bride type photoni rystal slab waveguides, in: Pro. PECS-VII (007), mpweb.ameslab.gov/pecsvii/abstrats/ WATANABEyoshinori.pdf. 6. F. Cuesta-Soto, A. Martinez, J. Garia, et al., All-optial ithing struture based on a photoni rystal diretional oupler, Optis Express, 1, No. 1, 161 167 (004). 7. Ts. Shyh-Lin () and Lu Chun-Yi, BPM simulation and omparison of 1 x diretional waveguide oupling and Y-juntion oupling silion-on-insulator optial ouplers, Fiber Integr. Opt., 1, No. 6, 417 433 (00). 8. M. A. Kartsev and V. A. Brik, Computing Systems and Synhronous Arithmetis [in Russian], Radio i Svyaz, Mosow (1981). 758