WHEN the extended binary field GFð2 n Þ is viewed as a

Size: px
Start display at page:

Download "WHEN the extended binary field GFð2 n Þ is viewed as a"

Transcription

1 4 IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO., FEBRUARY 7 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan, Senior Member, IEEE Abstract Based on Toeplitz matrix-vector products and coordinate transformation techniques, we present a new scheme for subquadratic space complexity parallel multiplication in GFð n Þ using the shifted polynomial basis. Both the space complexity and the asymptotic gate delay of the proposed multiplier are better than those of the best existing subquadratic space complexity parallel multipliers. For example, with n being a power of, the space complexity is about percent better, while the asymptotic gate delay is about 33 percent better, respectively. Another advantage of the proposed matrix-vector product approach is that it can also be used to design subquadratic space complexity polynomial, dual, weakly dual, and triangular basis parallel multipliers. To the best of our knowledge, this is the first time that subquadratic space complexity parallel multipliers are proposed for dual, weakly dual, and triangular bases. A recursive design algorithm is also proposed for efficient construction of the proposed subquadratic space complexity multipliers. This design algorithm can be modified for the construction of most of the subquadratic space complexity multipliers previously reported in the literature. Index Terms Finite field, subquadratic space complexity multiplier, coordinate transformation, shifted polynomial basis, Toeplitz matrix. Ç 1 INTRODUCTION WHEN the extended binary field GFð n Þ is viewed as a vector space over GFðÞ, the elements of the field are often represented with respect to a basis. For the purpose of efficient field arithmetic, a number of bases have been proposed in the literature, for example, polynomial, normal, dual, weakly dual, triangular, and shifted polynomial bases. Between the two basic field arithmetic operations, namely, addition and multiplication, it is easier to implement the former using these representations. Multiplication is inherently complex and the existing bit parallel multipliers may be classified into the following two categories on the basis of the space complexity, which is often measured in terms of the number of -input AND and XOR gates: quadratic and subquadratic space complexity multipliers. Multipliers of [5], [6], [7], [], [9], [1], [11], [1], [13], [14], [15], [16], [17], [1], [19], [] belong to the former category and those of [1], [], [3], [4], [5], [6], [7], [], [9], [3], [31], [3] the latter category. The subquadratic space complexity multiplier provides a practical solution for large values of n due to its low space complexity. To this end, the polynomial version of the integer Karatsuba multiplication algorithm [1] has been widely used, e.g., [1], [], [3], [4], [5], [6], [7], [], [9], [3]. These multipliers first perform a multiplication of two binary polynomials, each of degree n 1 or less, and then a modulo reduction operation using the field generating. The authors are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada. hfan@vlsi.uwaterloo.ca, ahasan@ece.uwaterloo.ca. Manuscript received 4 Jan. 6; revised 3 June 6; accepted 4 July 6; published online Dec. 6. For information on obtaining reprints of this article, please send to: tc@computer.org, and reference IEEECS Log Number TC irreducible polynomial. For practical applications, n may not be an even integer. In such cases, in order to apply the Karatsuba algorithm, some slight modifications are often made to the inputs, e.g., padding zeros. The asymptotic gate delay of the Karatsuba algorithm is ð3 log n 1ÞT X þ T A for n ¼ i ði>1þ, where T A and T X are the delay of one -input AND and XOR gates. The Karatsuba algorithm-based multiplier has a low asymptotic space complexity, but its gate delay is three times more than that of the existing fast parallel multipliers [31]. In order to reduce this delay, a hybrid structure has been developed in [5]. Although this method does not improve the asymptotic gate delay, it is effective for some finite fields, e.g., GFð 33 Þ. The Winograd short convolution algorithm may be viewed as a generalization of the original Karatsubabased algorithm [], [31], and [33]. For n ¼ 3 i ði>þ, the algorithm improves the asymptotic gate delay of the Karatsuba algorithm in [1] and results in an asymptotic gate delay of ð4 log 3 n 1ÞT X þ T A ð:5 log n 1ÞT X þ T A. But, the asymptotic space complexity of this method is slightly worse than that of the Karatsuba-based multiplier. The parallel multiplier presented in [3] is based on the Chinese Remainder Theorem CRT) and Montgomery s algorithm. It allows the use of the extension field, for which an irreducible trinomial or special pentanomial does not exist. In summary, all of the above subquadratic space complexity GFð n Þ parallel multipliers are based on the improved polynomial multiplication algorithms and will be referred to as the polynomial-based multipliers in the following. In this paper, we follow a different approach using Toeplitz matrix-vector products and propose a new scheme for the subquadratic space complexity GFð n Þ parallel 1-934/7/$5. ß 7 IEEE Published by the IEEE Computer Society

2 FAN AND HASAN: A NEW APPROACH TO SUBQUADRATIC SPACE COMPLEXITY PARALLEL MULTIPLIERS FOR EXTENDED BINARY FIELDS 5 multiplier. Our scheme takes advantage of a shifted polynomial basis SPB) and applies the coordinate transformation technique of [3] and [4]. The end result is that not only the space complexity of the new multiplier is lower than that of the existing polynomial-based multipliers, e.g., [1] and [31], its asymptotic gate delay is also better, in fact considerably better, than that of these polynomial-based designs. Another advantage of the proposed matrix-vector product approach is that it can also be used to design subquadratic space complexity parallel multipliers using polynomial, dual, weakly dual, or triangular basis. Because of the lack of apparent regularity, hardware implementations of subquadratic space complexity parallel multipliers requires considerable efforts. For large n, efficient design of subquadratic space complexity multipliers is often based on recursive application of the divideand-conquer technique and is not straightforward. Therefore, a systematic design approach is desirable. In this paper, a recursive design algorithm is also proposed for an efficient construction of the proposed subquadratic space complexity multipliers. We have implemented the algorithm using ANSI C. The program generates a set of explicit Boolean statements involving only assignment, AND, and XOR operations. Therefore, it essentially provides a lower level of abstraction, e.g., the gate level description as in VHDL and Verilog. The proposed design algorithm may also be modified for the construction the Karatsuba-based subquadratic space complexity multipliers. The remainder of this paper is organized as follows: In Section, we consider asymptotic complexities for computing Toeplitz matrix-vector products based on two and three-way splits. The proposed multipliers are presented in Section 3. The recursive algorithm for the efficient construction of the proposed subquadratic space complexity multiplier is introduced in Section 4. Finally, concluding remarks are made in Section 5. ASYMPTOTIC COMPLEXITIES OF TOEPLITZ MATRIX-VECTOR PRODUCT In this section, some basic noncommutative matrix-vector multiplication schemes and their asymptotic space and gate delay complexities are presented. Elements of the matrix are in GF ðþ. These schemes will be used to design the proposed parallel multiplier in the next section. Definition 1. An n n Toeplitz matrix is a matrix ðm k;i Þ, where i; k n 1, with the property that m k;i ¼ m k 1;i 1, where 1 i; k n 1. Remark 1. An n n Toeplitz matrix is determined by the n 1 elements of the first row and the first column and adding two n n Toeplitz matrices requires n 1 addition operations..1 Two-Way Split or n ¼ i ði>þ Assume that T is an n n Toeplitz matrix and V an n 1 column vector. Matrix T and vector V can be split as follows: T ¼ T 1 T and V ¼ V ; T T 1 V 1 where T, T 1, and T are ðn=þðn=þ matrices and are individually in Toeplitz form, and V and V 1 are ðn=þ1 column vectors. Now, the following noncommutative formula can be used to compute the Toeplitz matrix-vector product TV []: TV ¼ T 1 T V ¼ P þ P ; ð1þ T T 1 V 1 P 1 þ P where P ¼ðT þ T 1 ÞV 1 ;P 1 ¼ðT 1 þ T ÞV and P ¼ T 1 ðv þ V 1 Þ. Please note that the addition and subtraction are the same in fields of characteristic two. One important implication of 1) is that the product of an n n Toeplitz matrix and an n 1 vector is primarily reduced to three products of matrix and vector of sizes ðn=þðn=þ and ðn=þ1. Remark. A straightforward matrix addition to obtain T þ T 1 and T 1 þ T requires a total of ðn 1Þ XOR gates. Owing to the special structure of the Toeplitz matrix, some terms may be reused in computing T þ T 1 and T 1 þ T. Suppose that we have obtained the summation T þ T 1. Then, we only need to compute the first column of T 1 þ T since the last n= 1 elements in the first row of T 1 þ T also appear in the first column of T þ T 1. Therefore, a total of 3n= 1 XOR gates are required to compute T þ T 1 and T 1 þ T. Let symbols S and D stand for Space and Delay, respectively. We will use S b ðnþ, S b ðnþ, D b ðnþ, and D b ðnþ to denote the number of multiplication and addition operations, the time delays introduced by multiplication and addition operations for the case of n ¼ b i ði>þ, respectively. The following recurrence relations, which describe the algorithm complexities, can be established when this formula is used recursively to compute TV in the case of n ¼ i. S ðþ ¼3; D ðþ ¼1; S ðnþ ¼3S ðn=þ; D ðnþ ¼D ðn=þ; S ðþ ¼5; D ðþ ¼; S ðnþ and ¼3S ðn=þþ3n 1; D ðnþ ¼D ðn=þþ: In order to obtain the explicit complexities of the above recurrence relations, we need the following lemmas. Proofs of these lemmas are simple and not given here. Lemma 1. Let a, b, and i be positive integers. Let n ¼ b i, a 6¼ b, and a 6¼ 1. The solution to the recurrence relations R 1 ¼ e; R n ¼ ar n=b þ cn þ d; is R n ¼ a i e þ bc ð ai b i Þ þ dai ð 1Þ a b a 1 :

3 6 IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO., FEBRUARY 7 Lemma. Let b and i be positive integers and n ¼ b i. The solution to the recurrence relations R 1 ¼ ; R n ¼ R n=b þ d; is R n ¼ di ¼ dlog b n. Now, it is easy to obtain the following complexity results for computing TV in the case of n ¼ i ði>þ: S ðnþ ¼nlog 3 ; >< S ðnþ ¼5:5nlog 3 6n þ :5; D ðnþ ¼1; D ðnþ ¼ log n:. Three-Way Split or n ¼ 3 i ði>þ As stated earlier, assume that T is an n n Toeplitz matrix and V is an n 1 column vector. Similarly to the case n ¼ i ði >Þ, we may have a three-way split of matrix T and vector V and obtain the following noncommutative formula which computes the Toeplitz matrix-vector product TV []: 1 T T 1 T TV T 3 T T 1 A V 1 1 P þ P 3 þ P V 1 A P 1 þ P 3 þ P 5 A; T 4 T 3 T V P þ P 4 þ P 5 where T i ð i 4Þ are ðn=3þðn=3þ Toeplitz matrices, and < P ¼ðT þ T 1 þ T ÞV ; P 1 ¼ðT 1 þ T þ T 3 ÞV 1 ; : P ¼ðT þ T 3 þ T 4 ÞV ; < P 3 ¼ T 1 ðv 1 þ V Þ; P 4 ¼ T ðv þ V Þ; : P 5 ¼ T 3 ðv þ V 1 Þ: Based on Remark 1, additions of matrices in ) may require as many as 6ð n 3 1Þ XOR gates. However, by reusing repeated terms, the number of XOR gates can be considerably reduced. To this effect, we state the following lemma: Lemma 3. Matrix additions ðt þ T 1 þ T Þ, ðt 1 þ T þ T 3 Þ, and ðt þ T 3 þ T 4 Þ in ) can be performed using a total of n 1 two-input XOR gates. Proof. Let n ¼ 3m and the first row and the first column of the Toeplitz matrix T be ðt 3m 1 ;t 3m ; ;t Þ and ðt 3m 1 ;t 3m ; ;t 6m Þ T, respectively. There is a one-toone correspondence between Toeplitz matrix T and P polynomial 6m t i x i. Adding two n n Toeplitz matrices requires the same number of XOR gates as adding the corresponding polynomials of degree n. Therefore, we have the following polynomials corresponding to Toeplitz matrices T, T 1, T, T 3, and ðþ m T 4 : q ¼ X t i x i ; m q 1 ¼ X t iþm x i ; m q ¼ X t iþm x i ; m q 3 ¼ X t iþ3m x i ; and q 4 ¼ P m t iþ4m x i, respectively. Now, we can write q þ q 1 þ q ¼ Xm ðt i þ½t mþi þ t mþi ŠÞx i þðt m 1 þ½t m 1 þ t 3m 1 ŠÞx m 1 m þ X ð½t i þ t mþi Šþt mþi Þx i ; i¼m q 1 þ q þ q 3 ¼ Xm ft mþi þ t mþi þ t 3mþi gx i þð½t m 1 þ t 3m 1 Šþt 4m 1 Þx m 1 m þ X ðt mþi þ½t mþi þ t 3m 1 ŠÞx i ; i¼m q þ q 3 þ q 4 ¼ Xm ft mþi þ t 3mþi þ t 4mþi gx i þðt 3m 1 þ t 4m 1 þ t 5m 1 Þx m 1 m þ X ð½t mþi þ t 3mþi Šþt 4mþi Þx i : i¼m In 3), 4), and 5), reuses of terms occur in the following five cases: 1. Term ½t m 1 þ t 3m 1 Š in q þ q 1 þ q also appears in q 1 þ q þ q 3.. In q þ q 1 þ q, summations ½t mþi þ t mþi Šði m Þ and ½t i þ t mþi Š ðm i m Þ are the same. 3. Summations ft mþi þ t mþi þ t 3mþi g ð i m Þ in q 1 þ q þ q 3 are the same as summations ðt i þ t mþi þ t mþi Þðmim Þ in q þ q 1 þ q. 4. Summations ft mþi þ t 3mþi þ t 4mþi g ð i m Þ in q þ q 3 þ q 4 are the same as summations ðt mþi þ t mþi þ t 3mþi Þ ðm i m Þ in q 1 þ q þ q Summations ½t mþi þ t 3mþi Šðmim Þ appear in both q 1 þ q þ q 3 and q þ q 3 þ q 4. Thus, q þ q 1 þ q, q 1 þ q þ q 3, and q þ q 3 þ q 4 can be computed using n 1 XOR gates. tu This lemma is useful in determining the space complexity of the matrix-vector product TV. In order to determine the gate delay, we can consider any of the three summation terms on the right hand side of ), i.e., ð3þ ð4þ ð5þ

4 FAN AND HASAN: A NEW APPROACH TO SUBQUADRATIC SPACE COMPLEXITY PARALLEL MULTIPLIERS FOR EXTENDED BINARY FIELDS 7 P þ P 4 þ P 5 ¼ ½ðT þ T 3 þ T 4 ÞV Š þ ½T ðv þ V ÞþT 3 ðv þ V 1 ÞŠ: S ðn;p j ;p i Þ ¼ h i½h j S ðtþþk j p j t þ l j Šþk i n þ l i ; S ðn;p ¼ h i;p jþ j½h i S ðtþþk i p i t þ l i Šþk j n þ l j ; ð6þ For n ¼ 3, it is easy to see that computing the terms in the square brackets requires a gate delay of T A þ T X. Therefore, P þ P 4 þ P 5 may be obtained with a gate delay of T A þ 3T X. When this result and Lemma 3 are used recursively to compute TV, the following recurrence relations, which describe the algorithm complexities, can be established: S 3 ð3þ ¼6; D 3 ð3þ ¼1; S 3 ðnþ ¼6S 3 ðn=3þ; D 3 ðnþ ¼D 3 ðn=3þ; S 3 ð3þ ¼14; D 3 ð3þ ¼3; S 3 ðnþ and ¼6S 3 ðn=3þþ5n 1; D 3 ðnþ ¼D 3 ðn=3þþ3: After solving these recurrence relations, we obtain the following complexities for computing TV in the case of n ¼ 3 i ði>þ: S 3 ðnþ ¼nlog 3 6 ; >< S 3 ðnþ ¼4 5 nlog 3 6 5n þ 1 5 ; D 3 ðnþ ¼1; D 3 ðnþ ¼3 log 3 n:.3 Dealing with Arbitrary n In the previous two subsections, complexity results for the Toeplitz matrix-vector product are given for b ¼ and 3. Let us denote these two primes as p 1 ¼ and p ¼ 3. It is possible to find corresponding complexities for other small primes, say p 3 ¼ 5, p 4 ¼ 7; ;p w, by transposing [, Theorem 6, p. 17] the polynomial multiplication algorithms in [35] or [6]. Since we have complexity results for more than one prime, the following two questions arise while dealing with an arbitrary n: 1. If p i p j jn, where 1 i<jw, and complexity results for both p i and p j are available, how do we choose a sequence of these to obtain a lower complexity scheme?. If none of the p i s ð1 i wþ are factors of n, how can these complexity results be applied? We first discuss question 1. Let primes p j and p i divide n and ðn; p j ;p i Þ denote the Toeplitz matrix-vector product algorithm of size n that first applies the formula corresponding to p i and then that corresponding to p j. As for the XOR gate complexity, we assume that the following recurrence relations have been established for p i and p j -way splits: S p i ðnþ ¼h i S p i ðn=p i Þþk i n þ l i ; S p j ðnþ ¼h j S p j ðn=p j Þþk j n þ l j : Let n ¼ p i p j t ðt >Þ. Then, the XOR gate complexities of algorithms ðn; p j ;p i Þ and ðn; p i ;p j Þ are described as follows: where S ðtþ denotes the number of XOR gates required to compute the Toeplitz matrix-vector product of size t. Similarly to the above discussion on the XOR gate complexity, it can be easily shown that time and the AND gate complexities of algorithms ðn; p j ;p i Þ and ðn; p i ;p j Þ are the same. Since the XOR gate complexities presented in 6) involve parameters p i and p j, one can make a comparison of XOR gate complexities to select one of ðn; p j ;p i Þ and ðn; p i ;p j Þ. As an example, we now consider the special case of n ¼ 6t. The XOR gate complexities of algorithms ðn; 3; Þ and ðn; ; 3Þ are 63t 4 þ 1 S ðtþ and 66t 7 þ 1 S ðtþ, respectively. Since both algorithms ðn; ; 3Þ and ðn; 3; Þ have the same AND gate complexities and gate delays, algorithm ðn; 3; Þ is preferable. With regard to question, where none of the p i s ð1 i wþ divide n, one may choose one of the following two solutions: The first one is to pad one zero at the end of the vector and to extend the Toeplitz matrix from n n to ðn þ 1Þðn þ 1Þ by inserting zeroes at positions ð;nþ and ðn; Þ. Then, apply the complexity results for p 1 ¼, p ¼ 3, etc. The second one is to delete the first row and the last column of the Toeplitz matrix and the last element of the vector and then apply the complexity results for p 1 ¼, p ¼ 3, etc. The deleted elements are processed separately. 3 NEW SUBQUADRATIC MULTIPLIERS In this section, we will use the above scheme of Toeplitz matrix-vector product to design subquadratic space complexity multipliers. For representing elements of the field GFð n Þ, we first consider a shifted polynomial basis, which can be viewed as a generalization of the standard polynomial basis. Let x be a root of fðuþ and GFð n Þ¼GFðÞ½uŠ=ðfðuÞÞ.A shifted polynomial basis SPB) of GFð n Þ over GFðÞ is defined as follows []: Definition. Let v be an integer and the ordered set M ¼ fx i j i n 1g be a polynomial basis of GFð n Þ over GFðÞ. The ordered set x v M :¼ fx i v j i n 1g is called a shifted polynomial basis with respect to M. 3.1 Formulation Using SPB Let X ¼ðx v ;x vþ1 ; ;x n v 1 Þ T be the column vector of SPB basis elements, A ¼ða ;a 1 ; ;a n 1 Þ T be the coordinate column vector of the field element a ¼ x v P n 1 a ix i, and B, C, and D be defined similarly. For hardware implementation of a GFð n Þ SPB parallel multiplier, one method is to form a binary n n matrix Z, which depends on b and fðuþ, and then perform a matrix-vector product. Namely, the product c ¼ ab may be computed as follows c ¼ Xn 1 a i x i v b ¼ðx v b; ;x 1 b; b; xb; ;x n v 1 bþa ¼ X T ðz ; ;Z n 1 ÞA ¼ X T ZA; ð7þ

5 IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO., FEBRUARY 7 where Z i is the coordinate column vector of x i v b with respect to the SPB ð i n 1Þ and Z is an n n matrix. From 7), we have the matrix-vector product C ¼ ZA. However, Z is not generally a Toeplitz matrix. Therefore, the subquadratic scheme presented in the previous section cannot be used directly. In [3] and [4], coordinate transformation techniques were proposed. Using this technique, one may first transform Z into a Toeplitz matrix T, i.e., T ¼ UZ, where U is the transform matrix. Then use the subquadratic scheme to compute the Toeplitz matrixvector product, D ¼ TA: Finally, the result C is obtained by C ¼ U 1 D: In the following, we will apply this idea to an arbitrary irreducible trinomial fðuþ ¼u n þ u k þ 1 ð1 k n 1Þ and a special type of pentanomials fðuþ ¼u n þ u kþ1 þ u k þ u k 1 þ 1 ð1 <k<n 1Þ and present exact expressions of C for GFð n Þ generated by these special types of irreducible polynomials. Please note that, for all practical purposes, one may only need to consider irreducible trinomials and pentanomials since at least one of the two types of irreducible polynomials is known to exist for every value of n in the range 1 <n<11. In fact, there is no known value of n for which an irreducible polynomial of weight w<6 does not exist [34]. Also note that NIST has recommended five finite fields of characteristic two for the ECDSA Elliptic Curve Digital Signature Algorithm) applications: GFð 163 Þ, GFð 33 Þ, GFð 3 Þ, GFð 49 Þ, and GFð 571 Þ, but no irreducible trinomials exist for three degrees, that is, 163, 3, and 571. For these three fields, we have found all pairs of ðn; kþ for which fðuþ ¼u n þ u kþ1 þ u k þ u k 1 þ 1 is irreducible [36]: 163, 67), 163, 69), 163, 71), 163, 9), 163, 94), 163, 96), 3, 4), 3, 133), 3, 15), 3, 59), 571, 14), 571, 3), 571, 341), and 571, 467). We assume that the value of v is equal to k in the definition of SPB for irreducible trinomials fðuþ ¼ u n þ u k þ 1 ð1 k n 1Þ and irreducible pentanomials fðuþ ¼u n þ u kþ1 þ u k þ u k 1 þ 1 ð1 <k<n 1Þ. 3. New SPB Multipliers for General Irreducible Trinomials For irreducible trinomials, we have formed a simple transformation matrix to be used with SPB. This matrix is much simpler than what can be obtained using [3] and [4] and is given as follows: U ¼ I ðn vþðn vþ I vv ðþ ð9þ ; ð1þ where I vv is the v v identity matrix. Lemma 4. Let fðuþ ¼u n þ u v þ 1 ð1 v n 1Þ be an irreducible trinomial and Z and U be matrices defined in 7) and 1). Then, T ¼ UZ is a Toeplitz matrix. Proof. Let g ¼ x j v b ¼ P n 1 g ix i v ð j n Þ. Thus, the jth column of Z in 7), i.e., Z j, is the column vector consisting of the SPB coordinates of element g. Then, column Z jþ1 is the coordinate column vector of element xg ¼ Xn 1 g i x i vþ1 ¼ Xn g i x i vþ1 þ g n 1 ðx v þ 1Þ i¼1 ¼ g n 1 x v þ Xv 1 g i 1 x i v þðg v 1 þ g n 1 Þþ Xn 1 g i 1 x i v : Let bg and cxg be the two elements of GFð n Þ whose SPB coordinates form columns j and j þ 1 of matrix T, respectively. Because of premultiplication of U to Z, the lower n v rows respectively, the upper v rows) of Z become the upper n v rows respectively, the lower v rows) of the resultant matrix T ¼ UZ. Thus, we can write!! bg ¼ Xv 1 g i x i v x n v þ Xn 1 g i x i v x v ¼ n v 1 X i¼n v g v nþi x i þ and cxg ¼ g n 1 x v þ Xv 1 g i 1 x i v i¼1 i¼v n v 1 X i¼ v x n v g vþi x i ; þ ðg v 1 þ g n 1 Þþ Xn 1 g i 1 x i v ¼ g n 1 x n v þ ¼ n v 1 X i¼n vþ1 þðg v 1 þ g n 1 Þx v n v 1 X i¼n vþ1 g v n 1 i x i þ þðg v 1 þ g n 1 Þx v : g v n 1 i x i þ Xn v i¼ vþ1 g v 1þi x i x v n v 1 X i¼ vþ1 g v 1þi x i Careful comparison shows that elements at ði; jþ and ði þ 1;jþ 1Þ of matrix T are the same for the case i; j n. Therefore, T is a Toeplitz matrix. tu Now, we present an example to illustrate the above transformation. Let fx i 4 j i 6g be the SPB for the irreducible trinomial u 7 þ u 4 þ 1. Matrices Z and T are as follows: Z ¼ 1 b 4 þ b b 3 b b 1 b b 6 b 5 b 5 þ b 1 b 4 þ b b 3 b b 1 b b 6 b 6 þ b b 5 þ b 1 b 4 þ b b 3 b b 1 b b þ b 3 b 6 þ b b 5 þ b 1 b 4 þ b b 3 b b 1 ; b 1 b b 6 b 5 b 4 b 3 þ b 6 b þ b 5 B b b 1 b b 6 b 5 b 4 b 3 þ b 6 A b 3 b b 1 b b 6 b 5 b 4

6 FAN AND HASAN: A NEW APPROACH TO SUBQUADRATIC SPACE COMPLEXITY PARALLEL MULTIPLIERS FOR EXTENDED BINARY FIELDS 9 TABLE 1 Comparisons of Some Selected Subquadratic Multipliers for n ¼ b t and T ¼ 1 b 1 b b 6 b 5 b 4 b 3 þ b 6 b þ b 5 b b 1 b b 6 b 5 b 4 b 3 þ b 6 b 3 b b 1 b b 6 b 5 b 4 b 4 þ b b 3 b b 1 b b 6 b 5 : b 5 þ b 1 b 4 þ b b 3 b b 1 b b 6 B b 6 þ b b 5 þ b 1 b 4 þ b b 3 b b 1 b A b þ b 3 b 6 þ b b 5 þ b 1 b 4 þ b b 3 b b 1 It is clear that the transformation from Z to T requires no logic gates and the complexity to form Z was presented in [] as follows: Gate delay ¼ 1T X ; n 1 k 6¼ n; XOR gates ¼ n= k ¼ n: For U as given in 1), it is clear that C is obtained from D ¼ TA with no additional logic gates. Table 1 compares the asymptotic complexity of proposed constructions with those of the existing polynomial-based multipliers for the trinomial fðuþ ¼u n þ u k þ 1 ð1 k< dn=eþ, where n ¼ t or 3 t. Since no irreducible binary trinomial exists for the case jn, we will assume that the multiplication operation is performed in the ring GF ðþ½uš=ðfðuþþ, where fðuþ is a trinomial and n ¼ i ði>þ, so that the discussion of the asymptotic complexity is meaningful. The parallel multiplier presented in [3] is based on the Chinese Remainder Theorem CRT) and Montgomery s algorithm. It allows the use of extension fields for which an irreducible trinomial or special pentanomial does not exist. Please note that the gate delay value given in [1] and [31] is ð3 log nþt X þ T A for the case n ¼ t. But, one T X gate delay may be saved for the case n ¼ since the gate delay for computing expressions in the square brackets of ða 1 x þ a Þðb 1 x þ b Þ¼a 1 b 1 x þ ð½ða 1 þ a Þðb 1 þ b ÞŠ þ½a 1 b 1 þ a b ŠÞx þ a b is T A þ T X. However, this does not improve the asymptotic gate delay since the overlapping occurs when n>. Similarly, one T X gate delay may also be saved for the case of n ¼ 3. We also note that, for the case dn=e <k<n, multipliers presented in [1] and [31] require more XOR gates and delays than values listed in the table since more than two reduction operations are performed [6]. 3.3 New SPB Multipliers for Special Pentanomials fðuþ ¼u n þ u vþ1 þ u v þ u v 1 þ 1 ð1 <v<n 1Þ For this special type of pentanomial, we transform Z into the Toeplitz matrix T via the following lemma. Lemma 5. Let fðuþ ¼u n þ u vþ1 þ u v þ u v 1 þ 1 ð1 <v< n 1Þ be an irreducible pentanomial and Z be the matrix defined in 7). Let matrix U ¼ I ðn vþðn vþ þ J ðn vþðn vþ I vv þ Jvv T ; where J vv is a v v matrix with the single entry ð;v 1Þ ¼ 1 and all remaining entries are. Then, T ¼ UZ is a Toeplitz matrix. Proof. Let g ¼ x j v b ¼ P n 1 g ix i v ð j n Þ. Thus, the jth column of Z in 7), i.e., Z j, is the column vector consisting of the SPB coordinates of element g. Then, column Z jþ1 is the coordinate column vector of element xg ¼ Xn 1 g i x i vþ1 ¼ Xn g i x i vþ1 þ g n 1 ðx v þ x 1 þ 1 þ xþ: Let bg and cxg be the two elements of GFð n Þ whose SPB coordinates form columns j and j þ 1 of matrix T, respectively. Because of premultiplication of U to Z, we can write

7 3 IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO., FEBRUARY 7 and bg ¼ Xv g i x i v þðg v 1 þ g Þx 1 x n v þ ðg v þ g n 1 Þþ Xn 1 g i x i v ¼ðg v þ g n 1 Þx v þ Xn 1 þðg þ g v 1 Þx n v 1 ; x v g i x i v þ Xv g i x iþn v cxg ¼ g n 1 x v þ Xv 3 g i x i vþ1 þðg v þ g n 1 þ g n 1 Þx n 1 i¼1 x n v þ ðg v 1 þ g n 1 þ g n Þþðg v þ g n 1 Þx þ Xn 1 g i x i vþ1 ¼ðg v 1 þ g n þ g n 1 Þx v þðg v þ g n 1 Þx vþ1 þ Xn 1 g i x i vþ1 þ Xv g i x iþn vþ1 : x v Careful comparison shows that elements at ði; jþ and ði þ 1;jþ 1Þ of matrix T are the same for the case i; j n. Therefore, T is a Toeplitz matrix. tu Row operations described in the above lemma are as follows: 1. XOR row to row v 1;. XOR row n 1 to row v; 3. place the lower n v rows on the top of the upper v rows. Therefore, instead of computing C ¼ðc ;c 1 ; ;c n 1 Þ T ¼ Zða ;a 1 ; ;a n 1 Þ T ; we compute D ¼ðd ;d 1 ; ;d n 1 Þ T ¼ Tða ;a 1 ; ;a n 1 Þ T first, where c v þ c n 1 i ¼ ; >< c d i ¼ iþv 1 i n v 1; c iþv n n v i n ; c v 1 þ c i ¼ n 1: Then, we obtain coordinates of C from D as follows and it requires only two XOR gates: d n vþi i v ; >< d c i ¼ n 1 þ d n v i ¼ v 1; d n v 1 þ d i ¼ v; d i v v þ 1 i n 1: T ;i ¼ b v i i v 1; b v þ b n 1 i ¼ v; >< b v 1 þ b n þ b n 1 i ¼ v þ 1; b v i þ b nþv 1 i þ b nþv i þ b nþvþ1 i v þ i v; b nþv 1 i þ b nþv i þ b nþvþ1 i þ b nþv i v þ 1 i n ; b v þ b vþ1 þ b vþ þ b vþ1 þ b n 1 i ¼ n 1; T i; ¼ b vþi i n v 1; b v nþi n v i n v ; b þ b v 1 i ¼ n v 1; >< b þ b 1 þ b v i ¼ n v; b v nþi þ b v 1 nþi þ b v nþi n v þ 1 i n 3; þb vþ1 nþi b þ b v 3 þ b v þ b v 1 þ b v i ¼ n ; b 1 þ b v þ b v 1 þ b v þ b v 1 i ¼ n 1: Remark 3. Since some signals may be reused, a total of 3T X delays and no more than 5 n -input XOR gates are required to compute all elements of this Toeplitz matrix. 3.4 Use of Equally Spaced Pentanomials By carefully choosing other types of irreducible pentanomials, it is possible to reduce the space and time complexities for generating the Toeplitz matrix T and for obtaining the final product vector C. For example, consider a very special class of pentanomials of the form fðuþ ¼u 4s þ u 3s þ u s þ u s þ 1 of degree n ¼ 4s with s>and v ¼ s. Such an s-equally-spaced pentanomial is irreducible if s ¼ 5 i ði Þ [1]. These equally-spaced irreducible pentanomials are not that abundant; however, they can reduce the space and time complexities for obtaining T and C. For example, applying the following transformation matrix, the use of such an equally-spaced irreducible pentanomial of degree n requires :75n XOR gates and 1T X delay for T and :5n XOR gates and 1T X delay for C In n þ K! n n ; In n þ KT n n where Kn n is an n n matrix with its entry at ði; i þ n 4 Þ¼1 for i n 4 1 and all remaining entries being. Slightly different complexity results, namely, :75n XOR gates and 1T X delay for T and :5n XOR gates and T X delay for C, are obtained using the following transformation matrix: 1 In 4 n 4 In 4 n In 4 4 B n 4 In 4 n A : 4 In 4 n In 4 4 n 4 For the case 3 <v<ðn 3Þ=, we summarize explicit expressions of the first row and the first column of matrix T below. They are obtained by applying the above transformations on the coordinate expressions of Z in [36]. 3.5 Considerations for Other Bases While there appears to be no scheme known to directly use the well-known Karatsuba algorithm to design the subquadratic space complexity dual, weakly dual, and

8 FAN AND HASAN: A NEW APPROACH TO SUBQUADRATIC SPACE COMPLEXITY PARALLEL MULTIPLIERS FOR EXTENDED BINARY FIELDS 31 triangular basis parallel multipliers [3], [4], [1], and [19], the proposed matrix-vector product approach can be used for these bases. Namely, for a, b GFð n Þ, let a be represented with respect to a polynomial basis and b be by the dual, weakly dual, or triangular basis of the polynomial basis. Then, the product c ¼ ab can be written as a matrix-vector product C ¼ Hða ;a 1 ; ;a n 1 Þ T, where H depends on b and the field generating irreducible polynomial. Explicit expressions of entries of H and the complexity to compute H can be found in the above corresponding references. In these cases, H is not a Toeplitz matrix, but a Hankel matrix, i.e., entries at ði; jþ and ði 1;jþ 1Þ are equal. We may first exchange columns H i and H n 1 i for i<n= and reverse the column vector ða ;a 1 ; ;a n 1 Þ T. Then, perform the Toeplitz matrix-vector product. An alternative is to obtain the Hankel matrixvector product formulae, which are similar to those in Section and have the same asymptotic complexities. It is noted that no coordinate transformation is required for these parallel multipliers, which use the polynomial basis to represent input a and use the corresponding dual, weakly dual, or triangular basis to represent the other input b and the product c. 4 ALGORITHM FOR DESIGNING SUBQUADRATIC SPACE COMPLEXITY PARALLEL GFð n Þ MULTIPLIERS In order to design application-specific circuits, different levels of abstraction may be used to describe the hardware. Normally, a higher abstraction level provides more flexibility and a lower one provides a better performance. Based on the previous sections, we now present a recursive design algorithm for efficient construction of the proposed subquadratic space complexity multipliers. The Toeplitz matrix T is constructed first in the main program, then the recursive procedures are invoked, which output a set of explicit Boolean statements. These expressions involve only assignment, AND, and XOR operations. For example, T ½4Š½Š½Š½Š¼T½3Š½Š½Š½ŠT½3Š½Š½Š½1ŠT½3Š½Š½Š½Š and C½9Š ¼C½9ŠT½4Š½1Š½Š½ŠV½4Š½1Š½Š: Therefore, a lower level abstraction, e.g., the gate level in Verilog HDL, is provided for the design of the multiplier. We note that the proposed design algorithm may also be modified for the construction of the Karatsuba-based subquadratic space complexity multipliers. Algorithm A1: Design algorithm for the subquadratic space complexity multipliers. Input: Field generating irreducible polynomial fðuþ. Output: Program for computing c ¼ ab in GFð n Þ. { Clear the output vector C; Construct Toeplitz matrix T from B; Construct vector V from A; Toeplitz mvpðn; ; ; Þ; Perform the coordinate transformation. } Subprogram: Toeplitz mvpðint : ; flvl; fblk; posþ {/* : The size of the input vector. flvl: The calling level. fblk: The block number of the submatrix. pos: The final position that this mvp will XOR to. */ INT: cblk ¼, clvl ¼ flvl þ 1; IF ð ¼ 1Þ THEN { Print the sentence C½posŠ ¼C½posŠ ðm½flvlš½fblkš½š½šv½flvlš½fblkš½šþ; return;} IF ð ¼ mod Þ THEN{ // Print sentences for computing P submatrix T 1 þ T, which looks like T ½clvlŠ½cblkŠ½iŠ½jŠ ¼T ½flvlŠ½fblkŠ½iŠ½jŠ T ½flvlŠ½fblkŠ½iŠ½j þ =Š; where i; j =; 1 subvector V 1, which looks like V ½clvlŠ½cblkŠ½iŠ ¼V ½flvlŠ½fblkŠ½i þ =Š; where i =; Toeplitz mvpð=; clvl; cblk; posþ; cblk++; // End of computing P // Print sentences for computing P 1 submatrix T 1 þ T ; 1 subvector V ; Toeplitz mvpð=; clvl; cblk; pos þ =Þ; cblk++; // End of computing P 1 // Print sentences for computing P submatrix T 1 ; 1 subvector V þ V 1 ; Print sentences for operations PUSH vector P on the stack; Print sentences for operations Set vector P to ; Toeplitz mvpð=; clvl; cblk; posþ; cblk++; Print sentences for operations XOR P to P 1 and P ; // Please note that vector P is on the stack, // and vector P is in the position of P.; Print sentences for operations POP modified vector P from the stack; // End of computing P } ELSE IF ð ¼ mod 3Þ THEN{ // Sentences for 3jn are similar to those for jn. Omitted. } ELSE {// Padding Print sentences for padding zeroes after the last elements of matrix T s first row and first column, respectively; Print the sentence for padding a zero after the last element of vector V ;

9 3 IEEE TRANSACTIONS ON COMPUTERS, VOL. 56, NO., FEBRUARY 7 Print the sentence PUSHðC½pos þ ŠÞ; Toeplitz mvpð þ 1; flvl; fblk; posþ; Print the sentence POPðC½pos þ ŠÞ; } } 5 CONCLUSIONS A new scheme for the subquadratic space complexity parallel multiplier has been presented. Both the space complexity and the asymptotic gate delay of the proposed multiplier are lower than those of the best existing subquadratic space complexity parallel multipliers. For practical applications, the hybrid structure developed in [5] may also be modified to reduce the gate delay of the proposed multipliers at the cost of a slight increase in the space complexity. A recursive design algorithm has also been proposed for efficient construction of the proposed subquadratic space complexity multipliers. It may be modified for the construction of the Karatsuba-based subquadratic space complexity multipliers. We note that, although the proposed matrix-vector product approach may be used to design the subquadratic space complexity polynomial, shifted polynomial, dual, weakly dual, and triangular basis parallel multipliers, the gate delay of the SPB multiplier is always equal to or lower than those of other multipliers. For example, if we consider the irreducible trinomial fðuþ ¼u n þ u n 1 þ 1, then generating matrix T requires 1T X gate delays for the SPB, but at least ðlog nþt X for other bases. Finally, although the work presented here are primarily for hardware implementations, our Toeplitz matrix-vector product based approach can also be applied to software implementation using general purpose processors [37]. ACKNOWLEDGMENTS The authors thank the reviewers for their careful reading and useful comments. The work was supported in part by NSERC through grants awarded to Dr. Hasan. REFERENCES [1] A. Karatsuba and Y. Ofman, Multiplication of Multidigit Numbers on Automata, Soviet Physics-Doklady English translation), vol. 7, no. 7, pp , [] S. Winograd, Arithmetic Complexity of Computations. SIAM, 19. [3] M.A. Hasan and V.K. Bhargava, Division and Bit-Serial Multiplication over GF ðq m Þ, IEE Proc.-E, vol. 139, no. 3, pp. 3-36, May 199. [4] M.A. Hasan and V.K. Bhargava, Architecture for Low Complexity Rate-Adaptive Reed-Solomon Encoder, IEEE Trans. Computers, vol. 44, no. 7, pp , July [5] E.D. Mastrovito, VLSI Schemes for Multiplication over Finite Field GFð m Þ, Proc. Sixth Int l Conf. Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes AAECC-6), pp , 19. [6] B. Sunar and C.K. Koc, Mastrovito Multiplier for All Trinomials, IEEE Trans. Computers, vol. 4, no. 5, pp. 5-57, May [7] C.K. Koc and B. Sunar, Low-Complexity Bit-Parallel Canonical and Normal Basis Multipliers for a Class of Finite Fields, IEEE Trans. Computers, vol. 47, no. 3, pp , Mar [] A. Halbutogullari and C.K. Koc, Mastrovito Multiplier for General Irreducible Polynomials, IEEE Trans. Computers, vol. 49, no. 5, pp , May. [9] S.O. Lee, S.W. Jung, C.H. Kim, J. Yoon, J. Koh, and D. Kim, Design of Bit Parallel Multiplier with Lower Time Complexity, Proc. Sixth Int l Conf. Information Security and Cryptology ICICS 3), pp , 3. [1] E. Savas, A.F. Tenca, and C.K. Koc, A Scalable and Unified Multiplier Scheme for Finite Fields GFðpÞ and GFð m Þ, Proc. Cryptographic Hardware and Embedded Systems CHES ), C.K. Koc and C. Paar, eds., pp. 77-9, Aug.. [11] C.-C. Wang, T.-K. Truong, H.-M. Shao, L.J. Deutsch, J.K. Omura, and I.S. Reed, VLSI Schemes for Computing Multiplications and Inverses in GFð m Þ, IEEE Trans. Computers, vol. 34, no., pp , Aug [1] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, Modular Construction of Low Complexity Parallel Multipliers for a Class of Finite Fields GFð m Þ, IEEE Trans. Computers, vol. 41, no., pp , Aug [13] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, A Modified Massey-Omura Parallel Multiplier for a Class of Finite Fields, IEEE Trans. Computers, vol. 4, no. 1, pp. 17-1, Oct [14] A. Reyhani-Masoleh and M.A. Hasan, A New Construction of Massey-Omura Parallel Multiplier over GFð m Þ, IEEE Trans. Computers, vol. 51, no. 5, pp , May. [15] G. Drolet, A New Representation of Elements of Finite Fields GFð m Þ Yielding Small Complexity Arithmetic Circuits, IEEE Trans. Computers, vol. 47, no. 9, pp , Sept [16] R. Katti and J. Brennan, Low Complexity Multiplication in Finite Field Using Ring Representation, IEEE Trans. Computers, vol. 5, no. 4, pp , Apr. 3. [17] C. Negre, Quadrinomial Modular Arithmetic Using Modified Polynomial Basis, Proc. Int l Conf. Information Technology: Coding and Computing ITCC 5), vol. 1, pp , 5. [1] S.T.J. Fenn, M. Benaissa, and D. Taylor, GFð m Þ Multiplication and Division over the Dual Basis, IEEE Trans. Computers, vol. 45, no. 3, pp , Mar [19] H. Wu, M.A. Hasan, and I.F. Blake, New Low-Complexity Bit- Parallel Finite Field Multipliers Using Weakly Dual Bases, IEEE Trans. Computers, vol. 47, no. 11, pp , Nov [] H. Fan and Y. Dai, Fast Bit Parallel GFð n Þ Multiplier for All Trinomials, IEEE Trans. Computers, vol. 54, no. 4, pp , Apr. 5. [1] C. Paar, A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields, IEEE Trans. Computers, vol. 45, no. 7, pp , July [] C. Paar, P. Fleischmann, and P. Roelse, Efficient Multiplier Schemes for Galois Fields GFð 4n Þ, IEEE Trans. Computers, vol. 47, no., pp , Feb [3] M. Ernst, M. Jung, F. Madlener, S. Huss, and R. Blumel, A Reconfigurable System on Chip Implementation for Elliptic Curve Cryptography over GFð n Þ, Proc. Cryptographic Hardware and Embedded Systems CHES ), pp , 3. [4] M. Jung, F. Madlener, M. Ernst, and S. Huss, A Reconfigurable Coprocessor for Finite Field Multiplication in GFð n Þ, Proc. IEEE Workshop Heterogeneous Reconfigurable Systems on Chip,. [5] C. Grabbe, M. Bednara, J. Shokrollahi, J. Teich, and J. von zur Gathen, FPGA Designs of Parallel High Performance GFð 33 Þ Multipliers, Proc. Int l Symp. Circuits and Systems ISCAS 3), vol. II, pp. 6-71, 3. [6] A. Weimerskirch and C. Paar, Generalizations of the Karatsuba Algorithm for Efficient Implementations, ruhr-uni-bochum.de/imperia/md/content/texte/kaweb.pdf, 3. [7] F. Rodriguez-Henriquez and C.K. Koc, On Fully Parallel Karatsuba Multipliers for GFð m Þ, Proc. Int l Conf. Computer Science and Technology CST 3), pp , 3. [] A.E. Cohen and K.K. Parhi, Implementation of Scalable Elliptic Curve Cryptosystem Crypto-Accelerators for GFð m Þ, Proc. 13th Asilomar Conf. Signals, Systems, and Computers, vol. 1, pp , Nov. 4. [9] L.S. Cheng, A.M., and T.H. Yeap, Improved FPGA Implementations of Parallel Karatsuba Multiplication over GFð n Þ, Proc. 3rd Biennial Symp. Comm., 6. [3] J. von zur Gathen and J. Shokrollahi, Efficient FPGA-Based Karatsuba Multipliers for Polynomials over F, Proc. 1th Workshop Selected Areas in Cryptography SAC 5), 5.

10 FAN AND HASAN: A NEW APPROACH TO SUBQUADRATIC SPACE COMPLEXITY PARALLEL MULTIPLIERS FOR EXTENDED BINARY FIELDS 33 [31] B. Sunar, A Generalized Method for Constructing Subquadratic Complexity GFð k Þ Multipliers, IEEE Trans. Computers, vol. 53, no. 9, pp , Sept. 4. [3] J.-C. Bajard, L. Imbert, and G.A. Jullien, Parallel Montgomery Multiplication in GFð k Þ Using Trinomial Residue Arithmetic, Proc. 17th IEEE Symp. Computer Arithmetic ARITH 5), 5. [33] D.J. Bernstein, Multidigit Multiplication for Mathematicians, Advances in Applied Math., to appear. [34] G. Seroussi, Table of Low-Weight Binary Irreducible Polynomials, Technical Report HPL-9-135, Hewlett-Packard Laboratories, Palo Alto, Calif., HPL html, Aug [35] P.L. Montgomery, Five, Six, and Seven-Term Karatsuba-Like Formulae, IEEE Trans. Computers, vol. 54, no. 3, pp , Mar. 5. [36] H. Fan and M.A. Hasan, Fast Bit Parallel Shifted Polynomial Basis Multipliers in GFð n Þ, IEEE Trans. Circuits and Systems-I: Regular Papers, accepted. [37] H. Fan and M.A. Hasan, Alternative to the Karatsuba Algorithm for Software Implementation of GFð n Þ Multiplication, Technical Report CACR 6-13, Univ. of Waterloo, Waterloo, Canada, May 6. Haining Fan received the BSc degree in computer science and the MSc degree in military communications science, both from the Institute of Communications Engineering, Nanjing, China, in 199 and 1996, respectively. He received the PhD degree in computer science from Tsinghua University in 5. From 1996 to 4, he was with the Institute of China Electronic System Engineering Company. He is currently a postdoctoral fellow in the Department of Electrical and Computer Engineering at the University of Waterloo, Canada. His research interests include finite field arithmetic, computational complexity, and cryptography. M. Anwar Hasan received the BSc degree in electrical and electronic engineering and the MSc degree in computer engineering, both from the Bangladesh University of Engineering and Technology, in 196 and 19, respectively, and the PhD degree in electrical engineering from the University of Victoria in 199. From April to December of 199, he was a postdoctoral fellow at the University of Victoria. In January 1993, he became an assistant professor in the Department of Electrical and Computer Engineering, University of Waterloo. He was promoted to the rank of associate professor with tenure in 199 and then to full professor in. At the University of Waterloo, Dr. Hasan is also a member of the Centre for Applied Cryptographic Research, the Center for Wireless Communications, and the VLSI Research Group. From January to December of 1999, he was on sabbatical with Motorola Labs., Schaumburg, Illinois. Dr. Hasan is a recipient of the Raihan Memorial Gold Medal. At the University of Victoria, he was awarded the President s Research Scholarship four times. At the University of Waterloo, he received a Faculty of Engineering Distinguished Performance Award in and an Outstanding Performance Award in 4. He has served on the program and executive committees of several conferences. From to 4, he was an associate editor of the IEEE Transactions of Computers. He is a licensed professional engineer of Ontario. His current research interests include cryptographic hardware and embedded systems, dependable and secure computing, computer arithmetic and architecture, and computer and network security. He is a senior member of the IEEE.. For more information on this or any other computing topic, please visit our Digital Library at

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields 1 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan Senior Member, IEEE July 19, 2006 Abstract Based on Toeplitz matrix-vector

More information

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields

A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields 1 A New Approach to Subquadratic Space Complexity Parallel Multipliers for Extended Binary Fields Haining Fan and M. Anwar Hasan Senior Member, IEEE January 5, 2006 Abstract Based on Toeplitz matrix-vector

More information

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases 1 Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases H. Fan and M. A. Hasan March 31, 2007 Abstract Based on a recently proposed Toeplitz

More information

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Yin Li 1, Gong-liang Chen 2, and Xiao-ning Xie 1 Xinyang local taxation bureau, Henan, China. Email:yunfeiyangli@gmail.com, 2 School

More information

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach M A Hasan 1 and C Negre 2 1 ECE Department and CACR, University of Waterloo, Ontario, Canada 2 Team

More information

BSIDES multiplication, squaring is also an important

BSIDES multiplication, squaring is also an important 1 Bit-Parallel GF ( n ) Squarer Using Shifted Polynomial Basis Xi Xiong and Haining Fan Abstract We present explicit formulae and complexities of bit-parallel shifted polynomial basis (SPB) squarers in

More information

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases Arash Reyhani-Masoleh Department of Electrical and Computer Engineering The University of Western Ontario London, Ontario,

More information

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

A new class of irreducible pentanomials for polynomial based multipliers in binary fields Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials

Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials 1 Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials Yin Li, Xingpo Ma, Yu Zhang and Chuanda Qi Abstract We present a new type of bit-parallel non-recursive Karatsuba multiplier over

More information

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER Indian Journal of Electronics and Electrical Engineering (IJEEE) Vol.2.No.1 2014pp1-6 available at: www.goniv.com Paper Received :05-03-2014 Paper Published:28-03-2014 Paper Reviewed by: 1. John Arhter

More information

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation M A Hasan and C Negre Abstract We study Dickson bases for binary field representation Such representation

More information

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

A new class of irreducible pentanomials for polynomial based multipliers in binary fields Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date

More information

Reducing the Complexity of Normal Basis Multiplication

Reducing the Complexity of Normal Basis Multiplication Reducing the Complexity of Normal Basis Multiplication Ömer Eǧecioǧlu and Çetin Kaya Koç Department of Computer Science University of California Santa Barbara {omer,koc}@cs.ucsb.edu Abstract In this paper

More information

PARALLEL MULTIPLICATION IN F 2

PARALLEL MULTIPLICATION IN F 2 PARALLEL MULTIPLICATION IN F 2 n USING CONDENSED MATRIX REPRESENTATION Christophe Negre Équipe DALI, LP2A, Université de Perpignan avenue P Alduy, 66 000 Perpignan, France christophenegre@univ-perpfr Keywords:

More information

Fast arithmetic for polynomials over F 2 in hardware

Fast arithmetic for polynomials over F 2 in hardware Fast arithmetic for polynomials over F 2 in hardware JOACHIM VON ZUR GATHEN & JAMSHID SHOKROLLAHI (200). Fast arithmetic for polynomials over F2 in hardware. In IEEE Information Theory Workshop (200),

More information

Efficient multiplication using type 2 optimal normal bases

Efficient multiplication using type 2 optimal normal bases Efficient multiplication using type 2 optimal normal bases Joachim von zur Gathen 1, Amin Shokrollahi 2, and Jamshid Shokrollahi 3 1 B-IT, Dahlmannstr. 2, Universität Bonn, 53113 Bonn, Germany gathen@bit.uni-bonn.de

More information

High Performance GHASH Function for Long Messages

High Performance GHASH Function for Long Messages High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS

More information

Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination

Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination Murat Cenk, Anwar Hasan, Christophe Negre To cite this version: Murat Cenk, Anwar Hasan, Christophe Negre.

More information

Efficient FPGA-based Karatsuba multipliers for polynomials over F 2

Efficient FPGA-based Karatsuba multipliers for polynomials over F 2 JOACHIM VON ZUR GATHEN & JAMSHID SHOKROLLAHI (2005). Efficient FPGA-based Karatsuba multipliers for polynomials over F2. In Selected Areas in Cryptography (SAC 2005), BART PRENEEL & STAFFORD TAVARES, editors,

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Montgomery Multiplier and Squarer in GF(2 m )

Montgomery Multiplier and Squarer in GF(2 m ) Montgomery Multiplier and Squarer in GF( m ) Huapeng Wu The Centre for Applied Cryptographic Research Department of Combinatorics and Optimization University of Waterloo, Waterloo, Canada h3wu@cacrmathuwaterlooca

More information

High-Speed Polynomial Basis Multipliers Over for Special Pentanomials. José L. Imaña IEEE

High-Speed Polynomial Basis Multipliers Over for Special Pentanomials. José L. Imaña IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 63, NO 1, JANUARY 2016 1 High-Speed Polynomial Basis Multipliers Over for Special Pentanomials José L Imaña Abstract Efficient hardware implementations

More information

Subquadratic space complexity binary field multiplier using double polynomial representation

Subquadratic space complexity binary field multiplier using double polynomial representation University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Subquadratic space complexity binary field multiplier using double

More information

Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields

Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields Efficient Digit-Serial Normal Basis Multipliers over Binary Extension Fields ARASH REYHANI-MASOLEH and M. ANWAR HASAN University of Waterloo In this article, two digit-serial architectures for normal basis

More information

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7, HIGH PERFORMANCE MONTGOMERY MULTIPLICATION USING DADDA TREE ADDITION Thandri Adi Varalakshmi Devi 1, P Subhashini 2 1 PG Scholar, Dept of ECE, Kakinada Institute of Technology, Korangi, AP, India. 2 Assistant

More information

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis 2015 IEEE 22nd Symposium on Computer Arithmetic New Bit-Level Serial GF 2 m ) Multiplication Using Polynomial Basis Hayssam El-Razouk and Arash Reyhani-Masoleh Department of Electrical and Computer Engineering

More information

Modular Multiplication in GF (p k ) using Lagrange Representation

Modular Multiplication in GF (p k ) using Lagrange Representation Modular Multiplication in GF (p k ) using Lagrange Representation Jean-Claude Bajard, Laurent Imbert, and Christophe Nègre Laboratoire d Informatique, de Robotique et de Microélectronique de Montpellier

More information

High Performance GHASH Function for Long Messages

High Performance GHASH Function for Long Messages High Performance GHASH Function for Long Messages Nicolas Méloni, Christophe Negre, M. Anwar Hasan To cite this version: Nicolas Méloni, Christophe Negre, M. Anwar Hasan. High Performance GHASH Function

More information

On the Number of Trace-One Elements in Polynomial Bases for F 2

On the Number of Trace-One Elements in Polynomial Bases for F 2 On the Number of Trace-One Elements in Polynomial Bases for F 2 n Omran Ahmadi and Alfred Menezes Department of Combinatorics & Optimization University of Waterloo, Canada {oahmadid,ajmeneze}@uwaterloo.ca

More information

Polynomial Multiplication over Finite Fields using Field Extensions and Interpolation

Polynomial Multiplication over Finite Fields using Field Extensions and Interpolation 009 19th IEEE International Symposium on Computer Arithmetic Polynomial Multiplication over Finite Fields using Field Extensions and Interpolation Murat Cenk Department of Mathematics and Computer Science

More information

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2) An Algorithm for Inversion in GF2 m Suitable for Implementation Using a Polynomial Multiply Instruction on GF2 Katsuki Kobayashi, Naofumi Takagi, and Kazuyoshi Takagi Department of Information Engineering,

More information

Outline. Computer Arithmetic for Cryptography in the Arith Group. LIRMM Montpellier Laboratory of Computer Science, Robotics, and Microelectronics

Outline. Computer Arithmetic for Cryptography in the Arith Group. LIRMM Montpellier Laboratory of Computer Science, Robotics, and Microelectronics Outline Computer Arithmetic for Cryptography in the Arith Group Arnaud Tisserand LIRMM, CNRS Univ. Montpellier 2 Arith Group Crypto Puces Porquerolles, April 16 18, 2007 Introduction LIRMM Laboratory Arith

More information

Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves

Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves International Journal of Network Security, Vol.11, No.3, PP.155 162, Nov. 2010 155 Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves Sameh M. Shohdy, Ashraf B.

More information

CHEBYSHEV polynomials have been an essential mathematical

CHEBYSHEV polynomials have been an essential mathematical IEEE TRANSACTIONS ON COMPUTERS, VOL X, NO X, MONTH YYYY 1 A Karatsuba-based Algorithm for Polynomial Multiplication in Chebyshev Form Juliano B Lima, Student Member, IEEE, Daniel Panario, Member, IEEE,

More information

Revisiting Finite Field Multiplication Using Dickson Bases

Revisiting Finite Field Multiplication Using Dickson Bases Revisiting Finite Field Multiplication Using Dickson Bases Bijan Ansari and M. Anwar Hasan Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Ontario, Canada {bansari,

More information

Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials

Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials 4 IEEE TRANSACTIONS ON COMPUTERS, VOL 50, NO, JULY 001 Systematic Design of Original and Modified Mastrovito Multipliers for General Irreducible Polynomials Tong Zhang, Student Member, IEEE, and Keshab

More information

IEEE TRANSACTIONS ON COMPUTERS, VOL. 45, NO. 7, JULY 1996 ,*-1 2 PRELIMINARIES. 2.1 General Multiplication in GF(2") I"?

IEEE TRANSACTIONS ON COMPUTERS, VOL. 45, NO. 7, JULY 1996 ,*-1 2 PRELIMINARIES. 2.1 General Multiplication in GF(2) I? 856 IEEE TRANSACTIONS ON COMPUTERS, VOL. 45, NO. 7, JULY 1996 Fi chitecture for a Parallel Finite lier with Low Complexity Based on Composite Fields Christof Paar Abstract-In this paper a new bit-parallel

More information

SUBQUADRATIC BINARY FIELD MULTIPLIER IN DOUBLE POLYNOMIAL SYSTEM

SUBQUADRATIC BINARY FIELD MULTIPLIER IN DOUBLE POLYNOMIAL SYSTEM SUBQUADRATIC BINARY FIELD MULTIPLIER IN DOUBLE POLYNOMIAL SYSTEM Pascal Giorgi, Christophe Nègre 2 Équipe DALI, LP2A, Unversité de Perpignan avenue P. Alduy, F66860 Perpignan, France () pascal.giorgi@univ-perp.fr,

More information

Subquadratic Polynomial Multiplication over GF (2 m ) Using Trinomial Bases and Chinese Remaindering

Subquadratic Polynomial Multiplication over GF (2 m ) Using Trinomial Bases and Chinese Remaindering Subquadratic Polynomial Multiplication over GF (2 m ) Using Trinomial Bases and Chinese Remaindering Éric Schost 1 and Arash Hariri 2 1 ORCCA, Computer Science Department, The University of Western Ontario,

More information

On the distribution of irreducible trinomials over F 3

On the distribution of irreducible trinomials over F 3 Finite Fields and Their Applications 13 (2007) 659 664 http://www.elsevier.com/locate/ffa On the distribution of irreducible trinomials over F 3 Omran Ahmadi Department of Combinatorics and Optimization,

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID? Are standards compliant Elliptic Curve Cryptosystems feasible on RFID? Sandeep S. Kumar and Christof Paar Horst Görtz Institute for IT Security, Ruhr-Universität Bochum, Germany Abstract. With elliptic

More information

Efficient multiplication in characteristic three fields

Efficient multiplication in characteristic three fields Efficient multiplication in characteristic three fields Murat Cenk and M. Anwar Hasan Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 mcenk@uwaterloo.ca

More information

Sequential Multiplier with Sub-linear Gate Complexity

Sequential Multiplier with Sub-linear Gate Complexity Sequential Multiplier with Sub-linear Gate Complexity Anwar Hasan, Christophe Negre To cite this version: Anwar Hasan, Christophe Negre. Sequential Multiplier with Sub-linear Gate Complexity. [Research

More information

Design and Implementation of Efficient Modulo 2 n +1 Adder

Design and Implementation of Efficient Modulo 2 n +1 Adder www..org 18 Design and Implementation of Efficient Modulo 2 n +1 Adder V. Jagadheesh 1, Y. Swetha 2 1,2 Research Scholar(INDIA) Abstract In this brief, we proposed an efficient weighted modulo (2 n +1)

More information

A Pipelined Karatsuba-Ofman Multiplier over GF(3 97 ) Amenable for Pairing Computation

A Pipelined Karatsuba-Ofman Multiplier over GF(3 97 ) Amenable for Pairing Computation A Pipelined Karatsuba-Ofman Multiplier over GF(3 97 ) Amenable for Pairing Computation Nidia Cortez-Duarte 1, Francisco Rodríguez-Henríquez 1, Jean-Luc Beuchat 2 and Eiji Okamoto 2 1 Computer Science Departament,

More information

Parity of the Number of Irreducible Factors for Composite Polynomials

Parity of the Number of Irreducible Factors for Composite Polynomials Parity of the Number of Irreducible Factors for Composite Polynomials Ryul Kim Wolfram Koepf Abstract Various results on parity of the number of irreducible factors of given polynomials over finite fields

More information

SUFFIX PROPERTY OF INVERSE MOD

SUFFIX PROPERTY OF INVERSE MOD IEEE TRANSACTIONS ON COMPUTERS, 2018 1 Algorithms for Inversion mod p k Çetin Kaya Koç, Fellow, IEEE, Abstract This paper describes and analyzes all existing algorithms for computing x = a 1 (mod p k )

More information

Parallel Itoh-Tsujii Multiplicative Inversion Algorithm for a Special Class of Trinomials

Parallel Itoh-Tsujii Multiplicative Inversion Algorithm for a Special Class of Trinomials Parallel Itoh-Tsujii Multiplicative Inversion Algorithm for a Special Class of Trinomials Francisco Rodríguez-Henríquez 1, Guillermo Morales-Luna 1, Nazar A. Saqib 2 and Nareli Cruz-Cortés 1 (1) Computer

More information

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters International Journal of Engineering Research and Advanced Technology (IJERAT) DOI:http://dx.doi.org/10.31695/IJERAT.2018.3271 E-ISSN : 2454-6135 Volume.4, Issue 6 June -2018 Design and Comparison of Wallace

More information

Formulas for cube roots in F 3 m

Formulas for cube roots in F 3 m Discrete Applied Mathematics 155 (2007) 260 270 www.elsevier.com/locate/dam Formulas for cube roots in F 3 m Omran Ahmadi a, Darrel Hankerson b, Alfred Menezes a a Department of Combinatorics and Optimization,

More information

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials Inforation Processing Letters 107 008 11 15 www.elsevier.co/locate/ipl Low coplexity bit parallel ultiplier for GF generated by equally-spaced trinoials Haibin Shen a,, Yier Jin a,b a Institute of VLSI

More information

Chapter 4 Mathematics of Cryptography

Chapter 4 Mathematics of Cryptography Chapter 4 Mathematics of Cryptography Part II: Algebraic Structures Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4.1 Chapter 4 Objectives To review the concept

More information

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2016 FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their

More information

International Journal of Advanced Computer Technology (IJACT)

International Journal of Advanced Computer Technology (IJACT) AN EFFICIENT DESIGN OF LOW POWER,FAST EL- LIPTIC CURVE SCALAR MULTIPLIER IN ECC USING S Jayalakshmi K R, M.Tech student, Mangalam college of engineering,kottayam,india; Ms.Hima Sara Jacob, Assistant professor,

More information

Modular Reduction without Pre-Computation for Special Moduli

Modular Reduction without Pre-Computation for Special Moduli Modular Reduction without Pre-Computation for Special Moduli Tolga Acar and Dan Shumow Extreme Computing Group, Microsoft Research, Microsoft One Microsoft Way, Redmond, WA 98052, USA {tolga,danshu}@microsoft.com

More information

On A Large-scale Multiplier for Public Key Cryptographic Hardware

On A Large-scale Multiplier for Public Key Cryptographic Hardware 1,a) 1 1 1 1 1 Wallace tree n log n 64 128 Wallace tree,, Wallace tree,, VHDL On A Large-scale Multiplier for Public Key Cryptographic Hardware Masaaki Shirase 1,a) Kimura Keigo 1 Murayama Hiroyuki 1 Kato

More information

A New Algorithm to Compute Terms in Special Types of Characteristic Sequences

A New Algorithm to Compute Terms in Special Types of Characteristic Sequences A New Algorithm to Compute Terms in Special Types of Characteristic Sequences Kenneth J. Giuliani 1 and Guang Gong 2 1 Dept. of Mathematical and Computational Sciences University of Toronto at Mississauga

More information

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND

More information

Implementation Options for Finite Field Arithmetic for Elliptic Curve Cryptosystems Christof Paar Electrical & Computer Engineering Dept. and Computer Science Dept. Worcester Polytechnic Institute Worcester,

More information

New Algebraic Decoding of (17,9,5) Quadratic Residue Code by using Inverse Free Berlekamp-Massey Algorithm (IFBM)

New Algebraic Decoding of (17,9,5) Quadratic Residue Code by using Inverse Free Berlekamp-Massey Algorithm (IFBM) International Journal of Computational Intelligence Research (IJCIR). ISSN: 097-87 Volume, Number 8 (207), pp. 205 2027 Research India Publications http://www.ripublication.com/ijcir.htm New Algebraic

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

Useful Arithmetic for Cryptography

Useful Arithmetic for Cryptography Arithmetic and Cryptography 1/116 Useful Arithmetic for Cryptography Jean Claude Bajard LIP6 - UPMC (Paris 6)/ CNRS France 8 Jul 2013 Arithmetic and Cryptography 2/116 Introduction Modern Public Key Cryptography

More information

Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications

Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2017 Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications

More information

Appendix A Finite or Galois Fields

Appendix A Finite or Galois Fields Appendix A Finite or Galois Fields Finite or Galois fields are used in the design and in the interpretation of the operation of some arithmetic and algebraic circuits. The axioms and the main properties

More information

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs Sujoy Sinha Roy, Chester Rebeiro and Debdeep Mukhopadhyay Department of Computer Science and Engineering

More information

THE discrete sine transform (DST) and the discrete cosine

THE discrete sine transform (DST) and the discrete cosine IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BIREFS 1 New Systolic Algorithm and Array Architecture for Prime-Length Discrete Sine Transform Pramod K. Meher Senior Member, IEEE and M. N. S. Swamy

More information

Fields in Cryptography. Çetin Kaya Koç Winter / 30

Fields in Cryptography.   Çetin Kaya Koç Winter / 30 Fields in Cryptography http://koclab.org Çetin Kaya Koç Winter 2017 1 / 30 Field Axioms Fields in Cryptography A field F consists of a set S and two operations which we will call addition and multiplication,

More information

Transition Faults Detection in Bit Parallel Multipliers over GF(2 m )

Transition Faults Detection in Bit Parallel Multipliers over GF(2 m ) Transition Faults Detection in Bit Parallel Multipliers over GF( m ) Hafizur Rahaman Bengal Engineering & Science University, Shibpur Howrah-73, India rahaman_h@it.becs.ac.in Jimson Mathew Computer Science

More information

Convergence of a linear recursive sequence

Convergence of a linear recursive sequence int. j. math. educ. sci. technol., 2004 vol. 35, no. 1, 51 63 Convergence of a linear recursive sequence E. G. TAY*, T. L. TOH, F. M. DONG and T. Y. LEE Mathematics and Mathematics Education, National

More information

Possible numbers of ones in 0 1 matrices with a given rank

Possible numbers of ones in 0 1 matrices with a given rank Linear and Multilinear Algebra, Vol, No, 00, Possible numbers of ones in 0 1 matrices with a given rank QI HU, YAQIN LI and XINGZHI ZHAN* Department of Mathematics, East China Normal University, Shanghai

More information

Cryptographically Robust Large Boolean Functions. Debdeep Mukhopadhyay CSE, IIT Kharagpur

Cryptographically Robust Large Boolean Functions. Debdeep Mukhopadhyay CSE, IIT Kharagpur Cryptographically Robust Large Boolean Functions Debdeep Mukhopadhyay CSE, IIT Kharagpur Outline of the Talk Importance of Boolean functions in Cryptography Important Cryptographic properties Proposed

More information

Division of Trinomials by Pentanomials and Orthogonal Arrays

Division of Trinomials by Pentanomials and Orthogonal Arrays Division of Trinomials by Pentanomials and Orthogonal Arrays School of Mathematics and Statistics Carleton University daniel@math.carleton.ca Joint work with M. Dewar, L. Moura, B. Stevens and Q. Wang

More information

Faster ECC over F 2. (feat. PMULL)

Faster ECC over F 2. (feat. PMULL) Faster ECC over F 2 571 (feat. PMULL) Hwajeong Seo 1 Institute for Infocomm Research (I2R), Singapore hwajeong84@gmail.com Abstract. In this paper, we show efficient elliptic curve cryptography implementations

More information

Hardware Implementation of Elliptic Curve Cryptography over Binary Field

Hardware Implementation of Elliptic Curve Cryptography over Binary Field I. J. Computer Network and Information Security, 2012, 2, 1-7 Published Online March 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2012.02.01 Hardware Implementation of Elliptic Curve Cryptography

More information

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR 0716498 Part I. Linear maps Consider computing 0

More information

GF(2 m ) arithmetic: summary

GF(2 m ) arithmetic: summary GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation

More information

Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek

Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek Finite Fields In practice most finite field applications e.g. cryptography and error correcting codes utilizes a specific type of finite fields, namely the binary extension fields. The following exercises

More information

McBits: Fast code-based cryptography

McBits: Fast code-based cryptography McBits: Fast code-based cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands Joint work with Daniel Bernstein, Tung Chou December 17, 2013 IMA International Conference on Cryptography

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

GENERALIZED ARYABHATA REMAINDER THEOREM

GENERALIZED ARYABHATA REMAINDER THEOREM International Journal of Innovative Computing, Information and Control ICIC International c 2010 ISSN 1349-4198 Volume 6, Number 4, April 2010 pp. 1865 1871 GENERALIZED ARYABHATA REMAINDER THEOREM Chin-Chen

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA. GLOBAL JOURNAL OF ADVANCED ENGINEERING TECHNOLOGIES AND SCIENCES DESIGN OF A QUINARY TO RESIDUE NUMBER SYSTEM CONVERTER USING MULTI-LEVELS OF CONVERSION Hassan Amin Osseily Electrical and Electronics Department,

More information

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m ) A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,

More information

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Selçuk Baktır, Berk Sunar {selcuk,sunar}@wpi.edu Department of Electrical & Computer Engineering Worcester Polytechnic Institute

More information

DESPITE considerable progress in verification of random

DESPITE considerable progress in verification of random IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1 Formal Analysis of Galois Field Arithmetic Circuits - Parallel Verification and Reverse Engineering Cunxi Yu Student Member,

More information

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

PAPER A Low-Complexity Step-by-Step Decoding Algorithm for Binary BCH Codes

PAPER A Low-Complexity Step-by-Step Decoding Algorithm for Binary BCH Codes 359 PAPER A Low-Complexity Step-by-Step Decoding Algorithm for Binary BCH Codes Ching-Lung CHR a),szu-linsu, Members, and Shao-Wei WU, Nonmember SUMMARY A low-complexity step-by-step decoding algorithm

More information

Optimal Use of Montgomery Multiplication on Smart Cards

Optimal Use of Montgomery Multiplication on Smart Cards Optimal Use of Montgomery Multiplication on Smart Cards Arnaud Boscher and Robert Naciri Oberthur Card Systems SA, 71-73, rue des Hautes Pâtures, 92726 Nanterre Cedex, France {a.boscher, r.naciri}@oberthurcs.com

More information

Secret Sharing for General Access Structures

Secret Sharing for General Access Structures SECRET SHARING FOR GENERAL ACCESS STRUCTURES 1 Secret Sharing for General Access Structures İlker Nadi Bozkurt, Kamer Kaya, and Ali Aydın Selçuk Abstract Secret sharing schemes (SSS) are used to distribute

More information

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m )

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m ) A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m ) Johann Großschädl Graz University of Technology Institute for Applied Information Processing and Communications Inffeldgasse

More information

Chapter 4 Finite Fields

Chapter 4 Finite Fields Chapter 4 Finite Fields Introduction will now introduce finite fields of increasing importance in cryptography AES, Elliptic Curve, IDEA, Public Key concern operations on numbers what constitutes a number

More information

Efficient Modular Arithmetic in Adapted Modular Number System using Lagrange Representation

Efficient Modular Arithmetic in Adapted Modular Number System using Lagrange Representation Efficient Modular Arithmetic in Adapted Modular Number System using Lagrange Representation Christophe Negre 1, Thomas Plantard 2 1 Team DALI, University of Perpignan, France. 2 Centre for Information

More information

Binary Convolutional Codes of High Rate Øyvind Ytrehus

Binary Convolutional Codes of High Rate Øyvind Ytrehus Binary Convolutional Codes of High Rate Øyvind Ytrehus Abstract The function N(r; ; d free ), defined as the maximum n such that there exists a binary convolutional code of block length n, dimension n

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

A new algorithm for residue multiplication modulo

A new algorithm for residue multiplication modulo A new algorithm for residue multiplication modulo 2 521 1 Shoukat Ali and Murat Cenk Institute of Applied Mathematics Middle East Technical University, Ankara, Turkey shoukat.1983@gmail.com mcenk@metu.edu.tr

More information

On the Cross-Correlation of a p-ary m-sequence of Period p 2m 1 and Its Decimated

On the Cross-Correlation of a p-ary m-sequence of Period p 2m 1 and Its Decimated IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 3, MARCH 01 1873 On the Cross-Correlation of a p-ary m-sequence of Period p m 1 Its Decimated Sequences by (p m +1) =(p +1) Sung-Tai Choi, Taehyung Lim,

More information

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols Moncef Amara University of Paris 8 LAGA laboratory Saint-Denis / France Amar Siad University of Paris 8 LAGA

More information

New Minimal Weight Representations for Left-to-Right Window Methods

New Minimal Weight Representations for Left-to-Right Window Methods New Minimal Weight Representations for Left-to-Right Window Methods James A. Muir 1 and Douglas R. Stinson 2 1 Department of Combinatorics and Optimization 2 School of Computer Science University of Waterloo

More information