MODULAR multiplication with large integers is the main

Size: px
Start display at page:

Download "MODULAR multiplication with large integers is the main"

Transcription

1 1658 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 A General Digit-Serial Architecture for Montgomery Modular Multiplication Serdar Süer Erdem, Tuğrul Yanık, and Anıl Çelebi Abstract The Montgomery algorithm is a fast modular multiplication method frequently used in cryptographic applications. This paper investigates the digit-serial implementations of the Montgomery algorithm for large integers. A detailed analysis is given and a tight upper bound is presented for the intermediate results obtained during the digit-serial computation. Based on this analysis, an efficient digit-serial Montgomery modular multiplier architecture using carry save adders is proposed and its complexity is presented. In this architecture, pipelined carry select adders are used to perform two final tasks: adding carry save vectors representing the modular product and subtracting the modulus from this addition, if further reduction is needed. The proposed architecture can be designed for any digit size δ and modulus θ. This paper also presents logic formulas for the bits of the precomputation θ 1 mod 2 δ used in the Montgomery algorithm for δ 8. Finally, evaluation of the proposed architecture on Virtex 7 FPGAs is presented. Index Terms Carry-save addition, carry-select addition, Montgomery modular multiplication, RSA cryptosystem. I. INTRODUCTION MODULAR multiplication with large integers is the main computation in many public key cryptosystems such as RSA [1] and elliptic curve cryptography (ECC) [2], [3]. Conventional modular multiplication is an expensive operation because it requires division. The Montgomery algorithm [4] is an efficient modular multiplication technique replacing costly division with multiplication and bit shift operations. This algorithm uses a precomputation based on the modulus to achieve this improvement. Montgomery modular multiplication with a modulus θ can be implemented in two different ways. 1) The whole multiplier is multiplied by the multiplicand. Then, the resulting product is reduced by using the precomputation ψ = θ 1 mod 2 n, where n is an integer not less than the multiplier bit length. 2) Multiplication and reduction steps are interleaved. The multiplier is multiplied by the multiplicand, δ bits at a time. After each multiplication, the partial product is reduced and accumulated. In reduction, the precomputation ψ = θ 1 mod 2 δ is used. Manuscript received June 7, 2016; revised October 2, 2016 and December 5, 2016; accepted January 5, Date of publication February 2, 2017; date of current version April 24, S. S. Erdem is with the Department of Electronics Engineering, Gebze Technical University, Gebze, Turkey ( serdem@gtu.edu.tr). T. Yanık is with the Department of Computer Engineering, Celal Bayar University, Muradiye, Turkey ( tugrul.yanik@cbu.edu.tr). A. Çelebi is with the Department of Electronics and Communication Engineering, Kocaeli University, İzmit, Turkey ( anilcelebi@kocaeli.edu.tr). Digital Object Identifier /TVLSI The hardware architectures of Montgomery modular multiplication processing the multiplier bits, δ = 1 bit at a time, are abundant in the literature [5] [14]. In RSA and ECC applications, the modulus θ is an odd integer. Thus, the precomputation ψ = θ 1 mod 2 δ = 1alwayswhenδ = 1. The advantages of these architectures are obvious. They do not need to compute and store the precomputation. Also, their area complexities are low because they process the multiplier bit by bit. However, the hardware architectures with δ = 1havea major drawback. They need a large number of clock cycles to perform a single modular multiplication since they process the multiplier one bit a time. When δ = 1, at least n/δ = n clocks are needed to perform a modular multiplication, where n is the bit length of the multiplier. However, the bit length of the multiplier n is very large in many applications. For example, n = 1024 or n = 2048 for RSA typically. The hardware architectures with δ>1are also proposed in the literature to obtain speed improvement at the expense of area [12], [15] [18]. These are digit-serial implementations and called radix-2 δ or high radix Montgomery multiplication frequently. A radix-4 architecture using lookup tables is proposed in [12] and a radix-8 architecture using booth encoding is proposed in [15]. Unfortunately, it is very difficult to extend these architectures to support larger δ. The architecture in [16] uses a special modulus θ so that the precomputation ψ =±1mod2 δ. The architecture in [17] uses the precomputation ψ = θ 1 mod 2 δ for a general modulus θ and digit size δ. However, this architecture computes not n bit δ bit partial products but δ bit δ bit partial products in each clock cycle. Thus, it needs not n/δ but more than (n/δ) 2 clock cycles to perform a modular multiplication. Also, the Montgomery multiplier in [18] converts one of the operands into a sparse integer representation by a method based on canonic signed digit recoding. Though the resulting sparse representation of the operand enables fast multiplication, the computation time becomes input dependent. Thus, it is very vulnerable to side-channel attacks. This paper presents an efficient digit-serial hardware architecture for Montgomery algorithm. As many previous works, carry-save adders are used to accumulate partial products to avoid carry propagation delay. The proposed architecture does not put any restrictions on the modulus θ and digit size δ unlike [12], [15], and [16]. Also, it computes a modular multiplication in n/δ + c clock cycles for a small constant c. Because its computation time is fixed, it is less vulnerable IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 ERDEM et al.: GENERAL DIGIT-SERIAL ARCHITECTURE FOR MONTGOMERY MODULAR MULTIPLICATION 1659 to simple side-channel attacks, compared with the multipliers with input dependent computation times in [13], [14], and [18]. In this paper, a thorough mathematical analysis of the Montgomery algorithm is made and the exact bit lengths of the carry-save vectors representing the intermediate variables are determined. Using this analysis, a detailed bit-level description of the proposed multiplier and a complete evaluation of its complexities are presented. A final subtraction with modulus θ can be needed at the end of the Montgomery algorithm. Thus, fast adders must be used in the multiplier to perform addition and subtraction with carry-save vectors. We show how to use carry-select adders (CSAs) to handle this final subtraction without increasing the critical path delay. Also, we present the logic formulae for the bits of ψ = θ 1 mod 2 8 = f (θ 0,θ 1,...,θ 7 ) in terms of the modulus bits θ i. With these formulas, the precomputation ψ can easily be computed for the designs with digit size δ 8. In practical applications, this digit size is sufficient and a larger digit size increases the area complexity considerably. The higher bits of the precomputation ψ = θ 1 mod 2 δ can be calculated by the modular inversion algorithms in [19] and [20]. This paper is organized as follows. Section II introduces the Montgomery algorithm. Section III discusses the digit-serial computation of the Montgomery algorithm. Also, it presents a tight upperbound for intermediate results and shows the fast computation of the required precomputation. Section IV proposes a fast digit-serial architecture for Montgomery multiplication. Section V presents the complexity analysis and FPGA implementation results. Section VI discusses conclusions and future work. II. MONTGOMERY MULTIPLICATION Montgomery multiplication of the integers a and b is defined for some integer δ as follows: r = ab2 δ mod θ. This computation can be performed without division using the precomputation θ 1 mod 2 δ (Note that θ must be odd). A. Montgomery Domain In Montgomery multiplication, a, b, andr are actually the residual representations of some integers ã, b, and r a = ã2 δ mod θ, b = b2 δ mod θ, r = r2 δ mod θ. This special representation is called Montgomery domain. Note that Montgomery multiplication satisfies r = ab2 δ mod θ r2 δ mod θ = (ã2 δ mod θ)( b2 δ mod θ)2 δ mod θ r = ã b mod θ. As seen, the Montgomery product r = ab2 δ mod θ is actually the Montgomery domain computation of the modular multiplication r = ã b mod θ. The Montgomery product r = ab2 δ mod θ can be computed faster than r = ã b mod θ, when the precomputation θ 1 mod 2 δ is used. Of course, converting the integers ã, b, and r into a, b, andr in Montgomery domain and converting them back have a cost. Nevertheless, this cost is affordable if a large number of modular multiplications are performed in Montgomery domain. B. Montgomery Reduction The Montgomery multiplication r = ab2 δ mod θ can be split into the multiplication step u = ab and the reduction step r = u2 δ mod θ. The Montgomery modular reduction r = u2 δ mod θ can be estimated without division as follows: r = u2 δ mod θ r = (u + θq)/2 δ (1) where q = uψ mod 2 δ is a parameter depending on the precomputation ψ = θ 1 mod 2 δ. The rationale behind this modular reduction is as follows. Because θ is odd, gcd(θ, 2 δ ) = 1 and there exist two integers (2 δ mod θ) and (θ 1 mod 2 δ ) such that 2 δ (2 δ mod θ)+ θ(θ 1 mod 2 δ ) = 1. The precomputation ψ = θ 1 mod 2 δ. Thus 2 δ mod θ = (1 + θψ)/2 δ and the Montgomery modular reduction r = u2 δ mod θ (u + uθψ)/2 δ where q = uψ mod 2 δ. (u + θ(uψ mod 2 δ ))/2 δ (u + θq)/2 δ C. Calculation of Montgomery Product Using the reduction in (1), the Montgomery modular multiplication r = ab2 δ mod θ can be estimated without division as follows: r = ab2 δ mod θ r = (ab + θq)/2 δ where q = abψ mod 2 δ is a parameter depending on the precomputation ψ = θ 1 mod 2 δ. When a < a sup and b < 2 δ, Montgomery modular multiplication for integer δ is r = ab2 δ ab + θq mod θ r = 2 δ < a sup + θ (2) where q = abψ mod 2 δ and ψ = θ 1 mod 2 δ. Note that a final subtraction of the modulus θ may be needed to reduce r below the upper bound a sup as follows: where r = ab2 δ mod θ r = r εθ < a sup (3) ε = { 1, if r θ 0, if r <θ.

3 1660 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 In practice, the operands a and b and the modulus θ are represented by n log 2 θ bits. Also, the upper bound a sup must not be less than the modulus θ. Thatis a, b,θ <2 n, a < a sup [θ,2 n ]. Then, Montgomery multiplication for integer n is r = ab2 n ab + θq mod θ r = 2 n < a sup + θ (4) where q = abψ mod 2 n and ψ = θ 1 mod 2 n. Note that a final subtraction of the modulus θ may be needed to reduce r below the upper bound a sup as follows: r = ab2 n mod θ r = r εθ < a sup (5) where { 1, if r a sup [θ,2 n ] ε = 0, if r < a sup [θ,2 n ]. Now, there are two practical cases to consider here. 1) a sup = θ<2 n. 2) θ<a sup = 2 n. For the first case, the inequality in (4) becomes r = ab2 n mod θ r = (ab + θq)/2 n < 2θ where q = abψ mod 2 n and ψ = θ 1 mod 2 n. Also, after the final reduction in (5), the result r is equal to the Montgomery product r = ab2 n mod θ = r = { r θ, if r θ r, if r <θ. The second case is called incomplete arithmetic [21]. For the second case, the inequality in (4) becomes r = ab2 n mod θ r = (ab + θq)/2 n < 2 n + θ where q = abψ mod 2 n and ψ = θ 1 mod 2 n. Also, after the final reduction in (5), the result r is equivalent to the Montgomery product but may be larger than it { r = ab2 n mod θ r r θ, if r 2 n = r, if r < 2 n. Note that the final result r < 2 n and r < θ does not necessarily hold. Thus, r is only an n-bit value equivalent to the Montgomery product in the second case. The purpose of the incomplete arithmetic is to keep the operands and the results in the range [0, 2 n ). III. INTERLEAVED MODULAR MULTIPLICATION The following algorithm implements the Montgomery multiplication r = ab2 n mod θ given by (4) and (5). 1) v = ab for b = (b n 1,...,b 0 ) 2. 2) u = ( v + θq ) /2 n for q = vψ mod 2 n. 3) if u a sup then r = u θ else r = u. Here, ψ = θ 1 mod 2 n and a < a sup [θ,2 n ]. The operands are multiplied in Step 1) and the resulting product is reduced in Step 2). These multiplication and reduction steps can be interleaved as follows: 1) u = 0. 2) For i = 0ton/δ 1, a) v = u + aβ for β = (b iδ+δ 1,...,b iδ ) 2. b) u = ( v + θq ) /2 δ for q = vψ mod 2 δ. 3) If u a sup,thenr = u θ; elser = u. Here, ψ = θ 1 mod 2 δ and a < a sup [θ,2 n ].Inthis algorithm, the operand a is multiplied by the δ bits of the operand b in Step 2a) and the resulting product is reduced in Step 2b). A. Fast Calculation of ψ = θ 1 mod 2 δ for δ 8 The precomputation ψ must be calculated from the modulus θ and loaded into a register before modular multiplication. For this purpose, we present the boolean functions that give the bits from 0 to 7 of ψ in terms of the bits of the modulus θ as follows: ψ 0 = 1, ψ 3 = θ 1 θ 2 θ 3 ψ 1 = θ 1, ψ 4 = (θ 1 θ 3 )(θ 2 θ 3 ) θ 4 ψ 2 = θ 2, ψ 5 = (θ 1 θ 3 )(θ 2 θ 3 )(θ 3 θ 4 ) θ 1 θ 4 θ 5 ψ 6 = (θ 1 θ 5 )(θ 2 θ 5 )(θ 3 θ 5 )(θ 4 θ 5 ) (θ 1 θ 4 )(θ 2 θ 4 )(θ 3 θ 5 ) (θ 1 θ 5 )(θ 2 θ 4 ) θ 6, ψ 7 = (θ 1 θ 6 )(θ 2 θ 6 )(θ 3 θ 6 )(θ 4 θ 6 )(θ 5 θ 6 ) (θ 1 θ 2 )(θ 2 θ 4 )(θ 4 θ 6 )(θ 5 θ 6 ) (θ 1 θ 6 )(θ 2 θ 6 )(θ 3 θ 6 ) (θ 1 θ 2 )(θ 4 θ 6 ) θ 7. (6) These boolean functions can be implemented in hardware easily and used in digit-serial modular multipliers with digit size δ 8. As an example, let θ mod 2 8 = 185. That is θ mod 2 8 = (θ 7... θ 1 θ 0 ) 2 = ( ) 2 = 185 and the proposed boolean functions yield ψ = (ψ 7... ψ 1 ψ 0 ) 2 = ( ) 2 = 119. Note that the calculated ψ is really equal to θ 1 mod 2 δ for any δ 8 as follows: θψ = mod 2 δ = 1mod2 δ. These formulas are found by partly induction and partly trial and error. Their correctness is checked by mathematical software. B. Intermediate Results In each iteration of the interleaved modular multiplication, the variable u is recalculated. The upper bound of u must be known to decide the number of bits needed to represent it in hardware implementations. The following theorem gives the value of u in each iteration and an upper bound for these values. To the best of our knowledge, the current literature does not demonstrate such a theorem and its proof.

4 ERDEM et al.: GENERAL DIGIT-SERIAL ARCHITECTURE FOR MONTGOMERY MODULAR MULTIPLICATION 1661 Theorem 1: In the iteration i {0, 1,...,n/δ 1} of the interleaved Montgomery modular multiplication u = ab + θ(ab( θ 1 ) mod 2 (i+1)δ ) 2 (i+1)δ < a sup + θ for the operands a < a sup and b < 2 n where B = b mod 2 (i+1)δ = (b (i+1)δ 1,...,b 0 ) 2. Proof: Let the Montgomery product of a and B with respect to the integer (i + 1)δ be denoted as follows: r (i+1)δ = ab2 (i+1)δ mod θ. The Montgomery product with respect to the integer δ and its estimate is given by (2). The multiplier b < 2 δ in (2) and the multiplier B < 2 (i+1)δ here. Thus, we make the substitutions B b, 2 (i+1)δ 2 δ in (2) and obtain the estimate of r (i+1)δ as follows: r (i+1)δ r (i+1)δ = (ab + θq)/2 (i+1)δ < a sup + θ where q = ab( θ 1 ) mod 2 (i+1)δ. Note that u given in the theorem is nothing else than the estimate of the Montgomery product of a and B with respect to the integer (i + 1)δ r (i+1)δ r (i+1)δ = ab2 (i+1)δ mod θ. As a result, the theorem actually claims that u = r (i+1)δ < a sup + θ. Now, we must prove that u = r (i+1)δ. According to the theorem, the zeroth iteration of the interleaved Montgomery algorithm computes u = r (i+1)δ i=0 = r δ = ab + θ(ab( θ 1 ) mod 2 δ ) 2 δ where B = (b δ 1,...,b 0 ) 2 < 2 δ. This is true because after Steps 2a) and 2b) in the zeroth iteration of the algorithm, u = (aβ + θq)/2 δ where q = aβ( θ 1 ) mod 2 δ, β = (b δ 1,...,b 0 ) 2. Now, the proof can be completed by induction if we show that when the ith iteration computes u = r (i+1)δ,the(i + 1)th iteration computes u = r (i+2)δ in the interleaved Montgomery algorithm. Also, note that β = (b (i+1)δ+δ 1,...,b (i+1)δ ) 2 δ 1 = b (i+1)δ+ j 2 j = 1 (i+1)δ+δ 1 2 (i+1)δ b j 2 j j=0 j=(i+1)δ in the (i + 1)th iteration. Let k = (i + 1)δ. Then, we must show that if some iteration computes u = r k, the next iteration computes u = r k+δ where β = 1 k+δ 1 2 k b j 2 j. j=k Assume that some iteration computes u = r k = a k 1 j=0 b j2 j + θ [ a k 1 j=0 b j2 j ( θ 1 ) mod 2 k] 2 k. As seen from the interleaved Montgomery algorithm, each iteration performs u + aβ + θq u = 2 δ = u + aβ + θ((u + aβ)ψ mod 2δ ) 2 δ. Thus, the next iteration must compute u = r = r k + aβ + θ[( r k + aβ)( θ 1 ) mod 2 δ ] 2 δ. Let us multiply both numerator and denominator by 2 k r = 2k ( r k + aβ) + θ[2 k ( r k + aβ)( θ 1 ) mod 2 k+δ ] 2 k+δ. Note that 2 k ( r k + aβ) = P + θq where k+δ 1 P = a b j 2 j, j=0 k 1 Q = a b j 2 j ( θ 1 ) mod 2 k. j=0 Then, the computation in the next iteration becomes r = P + θq + θ[p( θ 1 ) Q mod 2 k+δ ] 2 k+δ. Note that P( θ 1 ) mod 2 k+δ Q as seen in the following: k+δ 1 P( θ 1 ) mod 2 k+δ = a b j 2 j ( θ 1 ) mod 2 k+δ j=0 k 1 Q = a b j 2 j ( θ 1 ) mod 2 k. j=0 Thus, we can further simplify the computation in the next iteration as follows: r = P + θ[p( θ 1 ) mod 2 k+δ ]/2 k+δ. Then, r = r k+δ as follows: r = a k+δ 1 j=0 b j 2 j + θ [ a k+δ 1 j=0 b j 2 j ( θ 1 ) mod 2 k+δ] 2 k+δ. Now, the proof by induction is complete. IV. PROPOSED MODULAR MULTIPLIER Algorithm 1 gives the bit-level interleaved Montgomery modular multiplication. The operands and the modulus θ in this algorithm are n bit. That is a < a sup = 2 n, b < 2 n, θ < 2 n. Then, it follows from Theorem 1 that the intermediate variable u in Algorithm 1 satisfies: u < a sup + θ = 2 n + θ and is thus an (n + 1)-bit variable. Also, note that a < 2 n, β < 2 δ, aβ <2 n+δ. Then, as seen from Algorithm 1 v = aβ + u < 2 n+δ + 2 n+1 (7) and v is an (n + δ + 1)-bit variable.

5 1662 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 Algorithm 1 Digit-Serial Montgomery Modular Multiplication for a δ-bit Digit Size Algorithm 2 Digit-Serial Montgomery Modular Multiplication With Carry-Save Addition for an δ-bit Digit Size Algorithm 2 is the same as Algorithm 1, except that it keeps the variables u and v in carry-save form. Fig. 1 illustrates Algorithm 2. The following key observations can be made from Fig. 1. 1) The first accumulator takes u in carry-save form and computes v = aβ + u in Step 2a) of Algorithm 2 in carry-save form. 2) v can be represented by an (n + δ 1)-bit save vector and an (n +δ)-bit carry vector because v<2 n+δ +2 n+1 as seen from (7). However, the least significant δ bits of carry vector and save vector are added together to use in the calculation of q = ψv mod 2 δ. 3) The second accumulator takes the carry-save vectors representing v and computes θq + v in Step 2b) of Algorithm 2, whose least significant δ bits are always zero. Then, these zero bits are removed to obtain u = (θq + v)/2 δ. A. Area Complexity The area complexity of the multiplier is as follows: 2n + δ flip flops 2nδ + δ 2 /2 3δ/2 AND gates 2nδ + δ 2 /2 5δ/2 + 1 full adders 3δ 2 half adders. (8) The above complexity does not include the area needed to store the multiplicands and modulus. Also, the complexity of computing ψ = θ 1 mod 2 δ is neglected because this cost is relatively small and very implementation dependent. The area complexity of the proposed multiplier is the sum of the following area requirements. 1) Storing the carry and sums (c (u) j, s (u) j ) requires 2n + 1 flip-flops. Also, storing ψ requires δ flip-flops as seen from Fig. 1. However, one flip-flop can be saved since ψ 0 = 1always. 2) The first accumulator requires nδ AND gates and nδ adders to accumulate the partial products aβ. However, δ 2 of the adders are half adders as can be understood from Fig. 2. 3) The calculation of q i requires δ(δ+1)/2 AND gates and (δ 2)(δ 1)/2 full adders and δ 1 half adders as can be understood from Fig. 3. However, δ AND gates can be saved because ψ 0 = 1always. 4) The second accumulator requires nδ AND gates to compute θq i and (n 1)(δ 2) + 2(n + δ 1) adders to add them over the outputs (c (v) j, s (v) j ) of the first accumulator, as can be understood from Fig. 3. However, δ AND gates can be saved because θ 0 = 1 always. Also, δ + 2ofthe adders are half adders. Moreover, one half adder can be saved since the sum s (v) 0 + θ 0 q 0 is redundant. Note that because ψ 0 = θ 0 = 1always s (v) 0 + θ 0 q 0 = s (v) 0 + θ 0 s (v) 0 ψ 0 = 2s (v) 0. B. Time Complexity The critical path delay of the multiplier is 3T AND + T HA + (δ + 4)T FA for δ>2 3T AND + T HA + 4T FA for δ = 2 (9)

6 ERDEM et al.: GENERAL DIGIT-SERIAL ARCHITECTURE FOR MONTGOMERY MODULAR MULTIPLICATION 1663 Fig. 1. Digit-serial Montgomery modular multiplication circuit. Fig. 2. Accumulation of the partial products aβ in Step 2a) for the digit size δ = 6 bits and the operand size n = 10 bits. where T AND is the AND gate delay, T FA is the full adder delay, and T HA is the half adder delay. The critical path delay is the sum of the following delays as can be understood from Figs. 2 and 3: 1) the delay T AND + (i + 1)T FA to compute s (v) i in the first accumulator for i <δ; 2) the delay T AND + T FA + T HA to compute q i from s (v) i for 2 i <δ;

7 1664 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 Fig. 3. Computation of q and accumulation of θq in Step 2b) for the digit size δ = 6 bits and the operand size n = 10 bits. 3) the delay T AND + (δ i + 2)T FA to compute θq i accumulate them in the second accumulator for 2 i <δ. C. Final Reduction In Step 3 of Algorithm 2, the n+1 bit addition t = (c (u) [n] c (u) [n 1] c (u) [n 2]... c (u) [0]) 2 +(0 s (u) [n 1] s (u) [n 2]... s (u) [0]) 2 (10) is performed. The output r = t, if the most significant bit of the addition t[n] =0. Otherwise r = (t[n] t[n 1] t[n 2]... t[1] t[0]) 2 (0 θ[n 1] θ[n 2]... θ[1] θ[0]) 2. (11) The data are kept in carry-save form in the previous steps of the proposed algorithm to avoid large carry propagation delays. Thanks to this strategy, the critical path delay for the implementation of the previous steps is linear with the digit size δ but not the operand size n as seen in (9). This is desirable since n 1 usually in practical applications. Unfortunately, the final addition and subtraction in Step 3 of Algorithm 2 require n + 1 bit carry and borrow propagations. Thus, these operations must be performed by a fast adder and even in more than one clock cycle. Otherwise, the critical path delay will be O(n) instead of O(δ). CSA is a fast and simple adder. Thus, it is a good choice to implement the final addition and subtraction. Fig. 4 shows an example Verilog implementation of Step 3 of Algorithm 2 with CSA. In this example, the operands of a 1024-bit sum are divided into smaller 168- and 172-bit blocks. These smaller operand blocks are added with two different ripple carry adders, one with carry in 0 and the other one with carry in 1. One of these two additions is always correct, and thus the correct 1024-bit sum is actually obtained. However, the correct input carries must be determined and the correct bits must be selected using multiplexers. Though the additions are almost doubled and the multiplexers are used, 168- and 172-bit additions in the example are performed 1024/168 and 1024/172 times faster than the 1024-bit addition. Thus, the addition with CSA is faster than the usual addition. Moreover, a 1024-bit addition can be pipelined and performed in more than one clock cycle to reduce the critical path. The additions in Fig. 4 are performed in two clock cycles. There are two 1024-bit additions in Fig. 4. The first one is the sum of the carry-save vectors in (10) and the second one is the subtraction with the modulus θ in (11) t[1024:0] = C[1024:0] + S[1023:0] tt[1023:0] = t[1023:0] + θ[1023:0] where θ[1023:0] is the negated modulus. In Fig. 4, enableadd signal must be high for three clock cycles. The correct values of the addition and subtraction results are obtained during these three clock cycles as follows. 1) t[511:0] and the carry mmc1 in the first clock cycle. 2) t[1024:512], tt[511:0], and the borrow mmb1 in the second clock cycle. 3) tt[1023:512] in the third clock cycle. As seen, the subtraction result tt[1023:0] lags one clock cycle since it depends on the addition result t[1024:0]. In the fourth clock cycles, the carry out of the addition result t[1024] is checked. If it is set, the output r[1023:0] is the subtraction result. If not, the output r[1023:0] is the addition result. The area complexity of Step 3 of Algorithm 2 for the operand size n is roughly as follows: 4nA FA + 2nA REG + 3nA MUX2 (12)

8 ERDEM et al.: GENERAL DIGIT-SERIAL ARCHITECTURE FOR MONTGOMERY MODULAR MULTIPLICATION 1665 Fig. 4. Example Verilog implementation of Step 3 of Algorithm 2 for 1024-bit operands with CSA using a 2-stage pipeline. Here, C[1023:0] and S[1023:0] are carry-save vectors. θ[1023:0] is negated modulus. TABLE I AREA COMPLEXITY OF DIFFERENT IMPLEMENTATIONS where A FA, A REG,andA MUX2 are, respectively, full adder, flip-flop, and multiplexer area. Nearly 4n full adders are needed because the addition and the subtraction in Step 3 of Algorithm 2 both require n full adders, and also carry-save adder strategy almost doubles the adders. At least 2n flip-flops are needed because the addition and subtraction results must both be stored. Almost n multiplexers are needed for addition with carry-save adders. Almost n multiplexers are needed for subtraction with carry-save adders. Exactly n multiplexers are needed to select one of the addition result and the subtraction result. The critical path delays of the carry-save adders in Step 3 of Algorithm 2 must be reduced by pipelining so that these delays do not exceed the critical path delay of Montgomery multiplication performed in carry-save form. Then, the clock cycles required for Step 3 is P + 2 (13) where P is the number of pipeline stages used in carry-save adders. P cycles are needed to perform the final addition and subtraction in Step 3 with carry-save adders. The (P + 1)th cycle is needed to finish the final subtraction, which depends on the final addition result. The (P + 2)th cycle is needed to select one of the addition and subtraction results. V. PERFORMANCE COMPARISONS Table I gives the area complexities of some Montgomery multipliers with digit size δ = 1 bit and the proposed multiplier for digit sizes δ = 2, 4, 8 bits. As seen, the proposed multiplier has a larger area complexity because its digit size δ > 1. The area complexity for the proposed multiplier is the complexities in (8) and (12) plus the register a and the register θ shown in Fig. 1. Note that the multipliers with digit size δ = 1 bit do not need fast adders. This is because they always keep their operands and results in carry-save form. These multipliers avoid the final subtraction in the last step of Montgomery multiplication using the clever scheme in [22]. In this scheme, the operands are bounded by two times the modulus. That is a, b < 2θ. And, the Montgomery multiplication for the integer n + 2 r = (ab + θq)/2 n+2 ab2 (n+2) mod θ is computed instead of (ab + θq)/2 n ab2 n mod θ. It can be shown that r < 2θ in this scheme. Thus, the operands and the result are all bounded by 2θ for the multiplier designs with digit size δ = 1 bit in Table I. Consequently, these multipliers can perform successive modular multiplications by keeping their inputs and outputs in carry-save form. Table II compares the time complexity of the proposed multiplier with the complexities of the Montgomery multipliers with digit size δ = 1 bit. As seen, though the proposed multiplier has a larger critical path delay, it requires fewer clock cycles to finish its computation as the digit size δ gets larger. Thus, the total computation time of the proposed multiplier is better for large δ. The time complexity for the proposed multiplier is obtained from (9) and (13).

9 1666 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 TABLE II TIME COMPLEXITY OF DIFFERENT IMPLEMENTATIONS TABLE III AREA AND TIME PERFORMANCES OF FPGA IMPLEMENTATIONS The required cycles for the proposed multiplier n/δ + P δ + 2 n/δ. This is because 512 n 2048, while the number of pipeline stages of the fast adders P δ is a small number in practice. Thus, the total computation time of the proposed multiplier can be approximated by (δ + 4) n δ T FA + n δ (3T AND + T HA ). As a result, the total computation time of the proposed multiplier is less than those of the other works in Table II for large digit size δ. FPGA implementation results are presented in Table III. We have written Verilog codes for the proposed digit serial multiplier and the Montgomery multipliers proposed in [8] and [10]. The Verilog codes are synthesized and implemented in Xilinx Vivado tool for Virtex-7, device xc7vx330t, package ffg1157 with speed grade-3. Table III gives the timing results from the implementation process for the default Vivado optimization strategy. The Vivado tool also gives the required numbers of slices, LUTs, and flip-flops for the implementation. It gives these numbers for the total implementation and individual modules (fast adder unit and Montgomery multiplier) separately. The complexities given in Table III are for the total implementations. As seen, the shortest clock period belongs to the multiplier proposed in [10]. However, the smallest area time product belongs to the proposed multiplier with digit size δ = 2 bits. As δ gets larger, the total computation time decreases, but the area time product gets larger too. Nevertheless, the proposed multiplier with digit size δ = 4 bits still has a smaller area time product than the multipliers proposed in [8] and [10]. Also, the proposed design with digit size δ = 2 requires fewer FPGA slices than the designs in [8] and [10]. This is interesting because though the proposed design needs fewer flip-flops, the designs in [8] and [10] have a substantial advantage over the proposed design in terms of combinational circuits as seen from Table I. There are two explanations for this. First, FPGA slices have dedicated carry chains, and thus the fast additions in the proposed design can be

10 ERDEM et al.: GENERAL DIGIT-SERIAL ARCHITECTURE FOR MONTGOMERY MODULAR MULTIPLICATION 1667 performed more efficiently. Second, the designs in [8] and [10] keep their inputs and output in carry save form unlike the proposed design. Because very large numbers are multiplied, representing all the data with two vectors complicates the FPGA design. Especially, input output units of these designs require a large number of slices. One of the important improvements of the proposed design is the total computation time. The proposed multiplier has a smaller computation time than [8] and [10] for all digit sizes. The proposed multiplier with digit size δ = 4 bits has at least two times smaller computation time than [8] and [10]. However, the gain in the computation time is declined for the digit size δ = 8. Tables I III give a very good idea for the performance of the proposed digit serial multiplier. There are many other high radix Montgomery multipliers in the literature. However, some of them are designed for a certain digit size [12], [15] and some are designed for certain types of modulus [16]. Also, the performances of these multipliers are given usually for Virtex 2 devices, which are considerably old technology. The work in [18] gives Virtex 5 performance results. However, this work is not a complete design. It neither gives the details of the fast adder implementation required for the final subtraction nor mentions the complexity of this implementation. The multiplier in [18] tries to convert one of the operands into a sparse integer representation by a method based on canonic signed digit recoding. After the conversion, signed digits are multiplied by the second operand and zero digits are skipped. Our multiplier can also be optimized using similar techniques. However, the required number of clock cycles for computation would then vary greatly according to the operands. The computation would be shorter for operands with many consecutive ones and zeros but would be longer for other types of operands. This variable computation time can lead to side-channel attacks in cryptographic applications such as RSA and ECC besides many other problems. VI. CONCLUSION This paper presents a detailed analysis of Montgomery algorithm and its digit serial computation. The precomputation ψ = θ 1 mod 2 δ is given in (6) for digit size δ 8. The computed intermediate results and their tight upperbound are given in Theorem 1. Using our analysis, we have developed a general digit serial architecture for the Montgomery algorithm, which can be implemented for any modulus θ and any digit size δ. To decrease the critical path of the multiplier, the accumulated product is kept in carry-save form. However, an addition and a subtraction are needed to obtain the final result. We have shown in detail how these addition and subtraction are performed with CSAs. CSAs are very suitable for FPGA implementations because FPGAs have fast carry chains to perform large additions. The CSA strategy just divides the huge additions required by applications into smaller additions, which can be performed by the carry chains in FPGAs in a desired time. The total computation time of the proposed digit serial multiplier is much less than the classical bit serial multipliers. However, this improvement comes at the expense of area. The area requirement increases rapidly with digit size δ. The FPGA implementation results show that the minimum area time product is achieved for digit size δ = 2. The proposed architecture with digit size δ = 2 has a considerably better area time product than the classical bit serial multiplication. Also, the proposed architecture with digit size δ = 4 has a comparable area time product with the classical bit serial multiplication. As a future research, the scheme in [22] can be used to avoid the final subtraction in the last step of the Montgomery algorithm and a new digit serial modular multiplier can be developed. In this way, applications can perform successive modular multiplications by keeping the operands and the results in carry save form all the time. The advantage of such an architecture is that it will not need any fast adders. Its disadvantage is that each operand and each computed result is represented with a carry vector and a save vector. Thus, the cost of storing and manipulating the data will increase. REFERENCES [1] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Commun. ACM, vol. 21, no. 2, pp , Feb [2] N. Koblitz, Elliptic curve cryptosystems, Math. Comput., vol. 48, no. 177, pp , [3] V. S. Miller, Use of elliptic curves in cryptography, in Advances in Cryptology CRYPTO (Lecture Notes in Computer Science), vol. 218, H. C. Williams, Ed. New York, NY, USA: Springer-Verlag, 1986, pp [4] P. L. Montgomery, Modular multiplication without trial division, Math. Comput., vol. 44, no. 170, pp , Apr [5] C.-C. Yang, T.-S. Chang, and C.-W. Jen, A new RSA cryptosystem hardware design based on Montgomery s algorithm, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 45, no. 7, pp , Jul [6] A. F. Tenca and Ç. K. Koç, A scalable architecture for Montgomery multiplication, in Cryptographic Hardware and Embedded Systems (Lecture Notes in Computer Science), Ç. K. Koç and C. Paar, Eds. London, U.K.: Springer-Verlag, 1999, pp [7] A. F. Tenca and Ç. K. Koç, A scalable architecture for modular multiplication based on Montgomery s algorithm, IEEE Trans. Comput., vol. 52, no. 9, pp , Sep [8] C. McIvor, M. McLoone, and J. V. McCanny, Modified Montgomery modular multiplication and RSA exponentiation techniques, IEE Proc.- Comput. Digit. Techn., vol. 151, no. 6, pp , Nov [9] D. M. Harris, R. Krishnamurthy, M. Anders, S. Mathew, and S. Hsu, An improved unified scalable radix-2 Montgomery multiplier, in Proc. 17th IEEE Symp. Comput. Arithmetic, Jun. 2005, pp [10] M. D. Shieh, J. H. Chen, H. H. Wu, and W. C. Lin, A new modular exponentiation architecture for efficient design of RSA cryptosystem, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 9, pp , Sep [11] M. D. Shieh and W.-C. Lin, Word-based Montgomery modular multiplication algorithm for low-latency scalable architectures, IEEE Trans. Comput., vol. 59, no. 8, pp , Aug [12] M. Huang, K. Gaj, and T. El-Ghazawi, New hardware architectures for Montgomery modular multiplication algorithm, IEEE Trans. Comput., vol. 60, no. 7, pp , Jul [13] S.-R. Kuang, J.-P. Wang, K.-C. Chang, and H.-W. Hsu, Energy-efficient high-throughput Montgomery modular multipliers for RSA cryptosystems, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 11, pp , Nov [14] S.-R. Kuang, K.-Y. Wu, and R.-Y. Lu, Low-cost high-performance VLSI architecture for Montgomery modular multiplication, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 2, pp , Feb [15] A. F. Tenca, G. Todorov, and Ç. K. Koç, High-radix design of a scalable modular multiplier, in Proc. 3rd Int. Workshop Cryptogr. Hardw. Embedded Syst. (CHES), 2001, pp

11 1668 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 [16] M. Knezevic, F. Vercauteren, and I. Verbauwhede, Faster interleaved modular multiplication based on Barrett and Montgomery reduction methods, IEEE Trans. Comput., vol. 59, no. 12, pp , Dec [17] A. Miyamoto, N. Homma, T. Aoki, and A. Satoh, Systematic design of RSA processors based on high-radix Montgomery multipliers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 7, pp , Jul [18] A. Rezai and P. Keshavarzi, High-throughput modular multiplication and exponentiation algorithms using multibit-scan multibit-shift technique, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp , Sep [19] S. R. Dussé and B. S. Kaliski, A cryptographic library for the Motorola DSP56000, in Advances in Cryptology EUROCRYPT, vol. 473, I. B. Damgard, Ed. New York, NY, USA: Springer-Verlag, 1990, pp [20] O. Arazi and H. Qi, On calculating multiplicative inverses modulo 2 m, IEEE Trans. Comput., vol. 57, no. 10, pp , Oct [21] T. Yanık, E. Savaş, and Ç. K. Koç, Incomplete reduction in modular arithmetic, IEE Proc.-Comput. Digit. Techn., vol. 149, no. 2, pp , Mar [22] C. D. Walter, Montgomery exponentiation needs no final subtractions, Electron. Lett., vol. 35, no. 21, pp , Oct Tuğrul Yanık received the B.S. degree in computer engineering from Agean University, Izmir, Turkey, in 1996, the M.S. degree in computer science and engineering from the Oregon Graduate Institute of Science and Technology, Beaverton, OR, USA, in 1999, and the Ph.D. degree in electrical and computer engineering from Oregon State University, Corvallis, OR, USA, in He was an Assistant Professor with the Department of Computer Engineering, Celal Bayar University, Manisa, Turkey. His current research interests include cryptography and network security, computer arithmetic, and architecture. Serdar Süer Erdem received the B.S. degree in electrical and electronics engineering from Boğaziçi University, Istanbul, Turkey, in 1992, the M.S. degree in electrical and computer engineering from Pennsylvania State University, State College, PA, USA, in 1996, and the Ph.D. degree in electrical and computer engineering from Oregon State University, Corvallis, OR, USA, in He was a Research and Development Software Engineer with a number of technology companies. In 2004, he joined the Faculty of Electronics Engineering, Gebze Institute of Technology, Gebze, Turkey. His current research interests include cryptography, embedded systems security, computer arithmetic, finite fields, and network security. Anıl Çelebi received the B.S., M.S., and Ph.D. degrees in electronics and communication engineering from Kocaeli University, Kocaeli, Turkey, in 2002, 2005, and 2008, respectively. Since 2002, he has been with the Department of Electronics and Telecommunications Engineering, Kocaeli University, where he is currently an Assistant Professor. His current research interests include VLSI design and implementation for analog/mixed signal systems, image processing, and video coding systems.

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm An Optimized Hardware Architecture of Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, and Tarek El-Ghazawi 1 1 The George Washington University, Washington, DC 20052,

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7, HIGH PERFORMANCE MONTGOMERY MULTIPLICATION USING DADDA TREE ADDITION Thandri Adi Varalakshmi Devi 1, P Subhashini 2 1 PG Scholar, Dept of ECE, Kakinada Institute of Technology, Korangi, AP, India. 2 Assistant

More information

Lecture 8: Sequential Multipliers

Lecture 8: Sequential Multipliers Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

SUFFIX PROPERTY OF INVERSE MOD

SUFFIX PROPERTY OF INVERSE MOD IEEE TRANSACTIONS ON COMPUTERS, 2018 1 Algorithms for Inversion mod p k Çetin Kaya Koç, Fellow, IEEE, Abstract This paper describes and analyzes all existing algorithms for computing x = a 1 (mod p k )

More information

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m )

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m ) A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m ) Johann Großschädl Graz University of Technology Institute for Applied Information Processing and Communications Inffeldgasse

More information

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

Arithmetic in Integer Rings and Prime Fields

Arithmetic in Integer Rings and Prime Fields Arithmetic in Integer Rings and Prime Fields A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA C 3 FA C 2 FA C 1 FA C 0 C 4 S 3 S 2 S 1 S 0 http://koclab.org Çetin Kaya Koç Spring 2018 1 / 71 Contents Arithmetic in Integer

More information

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER Indian Journal of Electronics and Electrical Engineering (IJEEE) Vol.2.No.1 2014pp1-6 available at: www.goniv.com Paper Received :05-03-2014 Paper Published:28-03-2014 Paper Reviewed by: 1. John Arhter

More information

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND

More information

Modular Multiplication in GF (p k ) using Lagrange Representation

Modular Multiplication in GF (p k ) using Lagrange Representation Modular Multiplication in GF (p k ) using Lagrange Representation Jean-Claude Bajard, Laurent Imbert, and Christophe Nègre Laboratoire d Informatique, de Robotique et de Microélectronique de Montpellier

More information

Design and Implementation of Efficient Modulo 2 n +1 Adder

Design and Implementation of Efficient Modulo 2 n +1 Adder www..org 18 Design and Implementation of Efficient Modulo 2 n +1 Adder V. Jagadheesh 1, Y. Swetha 2 1,2 Research Scholar(INDIA) Abstract In this brief, we proposed an efficient weighted modulo (2 n +1)

More information

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials Yin Li 1, Gong-liang Chen 2, and Xiao-ning Xie 1 Xinyang local taxation bureau, Henan, China. Email:yunfeiyangli@gmail.com, 2 School

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

Modular Reduction without Pre-Computation for Special Moduli

Modular Reduction without Pre-Computation for Special Moduli Modular Reduction without Pre-Computation for Special Moduli Tolga Acar and Dan Shumow Extreme Computing Group, Microsoft Research, Microsoft One Microsoft Way, Redmond, WA 98052, USA {tolga,danshu}@microsoft.com

More information

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases Arash Reyhani-Masoleh Department of Electrical and Computer Engineering The University of Western Ontario London, Ontario,

More information

Efficient randomized regular modular exponentiation using combined Montgomery and Barrett multiplications

Efficient randomized regular modular exponentiation using combined Montgomery and Barrett multiplications University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2016 Efficient randomized regular modular exponentiation

More information

Représentation RNS des nombres et calcul de couplages

Représentation RNS des nombres et calcul de couplages Représentation RNS des nombres et calcul de couplages Sylvain Duquesne Université Rennes 1 Séminaire CCIS Grenoble, 7 Février 2013 Sylvain Duquesne (Rennes 1) RNS et couplages Grenoble, 07/02/13 1 / 29

More information

Lecture 8. Sequential Multipliers

Lecture 8. Sequential Multipliers Lecture 8 Sequential Multipliers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 9, Basic Multiplication Scheme Chapter 10, High-Radix Multipliers Chapter

More information

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID? Are standards compliant Elliptic Curve Cryptosystems feasible on RFID? Sandeep S. Kumar and Christof Paar Horst Görtz Institute for IT Security, Ruhr-Universität Bochum, Germany Abstract. With elliptic

More information

Elliptic Curve Cryptography and Security of Embedded Devices

Elliptic Curve Cryptography and Security of Embedded Devices Elliptic Curve Cryptography and Security of Embedded Devices Ph.D. Defense Vincent Verneuil Institut de Mathématiques de Bordeaux Inside Secure June 13th, 2012 V. Verneuil - Elliptic Curve Cryptography

More information

A Fast Modular Reduction Method

A Fast Modular Reduction Method A Fast Modular Reduction Method Zhengjun Cao 1,, Ruizhong Wei 2, Xiaodong Lin 3 1 Department of Mathematics, Shanghai University, China. caozhj@shu.edu.cn 2 Department of Computer Science, Lakehead University,

More information

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b

More information

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach M A Hasan 1 and C Negre 2 1 ECE Department and CACR, University of Waterloo, Ontario, Canada 2 Team

More information

Hardware Implementation of Elliptic Curve Processor over GF (p)

Hardware Implementation of Elliptic Curve Processor over GF (p) Hardware Implementation of Elliptic Curve Processor over GF (p) Sıddıka Berna Örs, Lejla Batina, Bart Preneel K.U. Leuven ESAT/COSIC Kasteelpark Arenberg 10 B-3001 Leuven-Heverlee, Belgium {Siddika.BernaOrs,

More information

Hardware Implementation of an Elliptic Curve Processor over GF(p)

Hardware Implementation of an Elliptic Curve Processor over GF(p) Hardware Implementation of an Elliptic Curve Processor over GF(p) Sıddıka Berna Örs, Lejla Batina,, Bart Preneel, Joos Vandewalle Katholieke Universiteit Leuven, ESAT/SCD-COSIC Kasteelpark Arenberg, B-3

More information

International Journal of Advanced Computer Technology (IJACT)

International Journal of Advanced Computer Technology (IJACT) AN EFFICIENT DESIGN OF LOW POWER,FAST EL- LIPTIC CURVE SCALAR MULTIPLIER IN ECC USING S Jayalakshmi K R, M.Tech student, Mangalam college of engineering,kottayam,india; Ms.Hima Sara Jacob, Assistant professor,

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. XX, NO. X, MONTH 2007 1 A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations James A. Muir Abstract We present a simple algorithm

More information

Tree and Array Multipliers Ivor Page 1

Tree and Array Multipliers Ivor Page 1 Tree and Array Multipliers 1 Tree and Array Multipliers Ivor Page 1 11.1 Tree Multipliers In Figure 1 seven input operands are combined by a tree of CSAs. The final level of the tree is a carry-completion

More information

Tripartite Modular Multiplication

Tripartite Modular Multiplication Tripartite Modular Multiplication Kazuo Sakiyama 1,2, Miroslav Knežević 1, Junfeng Fan 1, Bart Preneel 1, and Ingrid Verbauhede 1 1 Katholieke Universiteit Leuven Department of Electrical Engineering ESAT/SCD-COSIC

More information

On A Large-scale Multiplier for Public Key Cryptographic Hardware

On A Large-scale Multiplier for Public Key Cryptographic Hardware 1,a) 1 1 1 1 1 Wallace tree n log n 64 128 Wallace tree,, Wallace tree,, VHDL On A Large-scale Multiplier for Public Key Cryptographic Hardware Masaaki Shirase 1,a) Kimura Keigo 1 Murayama Hiroyuki 1 Kato

More information

Efficient Hardware Implementation of Finite Fields with Applications to Cryptography

Efficient Hardware Implementation of Finite Fields with Applications to Cryptography Acta Appl Math (2006) 93: 75 118 DOI 10.1007/s10440-006-9072-z Efficient Hardware Implementation of Finite Fields with Applications to Cryptography Jorge Guajardo Tim Güneysu Sandeep S. Kumar Christof

More information

Residue Number Systems Ivor Page 1

Residue Number Systems Ivor Page 1 Residue Number Systems 1 Residue Number Systems Ivor Page 1 7.1 Arithmetic in a modulus system The great speed of arithmetic in Residue Number Systems (RNS) comes from a simple theorem from number theory:

More information

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters International Journal of Engineering Research and Advanced Technology (IJERAT) DOI:http://dx.doi.org/10.31695/IJERAT.2018.3271 E-ISSN : 2454-6135 Volume.4, Issue 6 June -2018 Design and Comparison of Wallace

More information

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware Kris Gaj Soonhak Kwon Patrick Baier Paul Kohlbrenner Hoang Le Khaleeluddin Mohammed Ramakrishna Bachimanchi George Mason University

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations James A. Muir School of Computer Science Carleton University, Ottawa, Canada http://www.scs.carleton.ca/ jamuir 23 October

More information

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases 1 Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases H. Fan and M. A. Hasan March 31, 2007 Abstract Based on a recently proposed Toeplitz

More information

Lecture 11. Advanced Dividers

Lecture 11. Advanced Dividers Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division

More information

Cost/Performance Tradeoffs:

Cost/Performance Tradeoffs: Cost/Performance Tradeoffs: a case study Digital Systems Architecture I. L10 - Multipliers 1 Binary Multiplication x a b n bits n bits EASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-,

More information

Montgomery s Multiplication Technique: How to Make it Smaller and Faster

Montgomery s Multiplication Technique: How to Make it Smaller and Faster Montgomery s Multiplication Technique: How to Make it Smaller and Faster Colin D. Walter Computation Department, UMIST PO Box 88, Sackville Street, Manchester M60 1QD, UK www.co.umist.ac.uk Abstract. Montgomery

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1)) The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering

More information

Computer Architecture 10. Fast Adders

Computer Architecture 10. Fast Adders Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance

More information

Resource Efficient Design of Quantum Circuits for Quantum Algorithms

Resource Efficient Design of Quantum Circuits for Quantum Algorithms Resource Efficient Design of Quantum Circuits for Quantum Algorithms Himanshu Thapliyal Department of Electrical and Computer Engineering University of Kentucky, Lexington, KY hthapliyal@uky.edu Quantum

More information

Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand

Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand Christophe Negre, Thomas Plantard, Jean-Marc Robert Team DALI (UPVD) and LIRMM (UM2, CNRS), France CCISR, SCIT, (University

More information

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length International Journal of Soft Computing and Engineering (IJSCE) Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length D.Ashok Kumar, P.Samundiswary Abstract Now a day

More information

Optimal Use of Montgomery Multiplication on Smart Cards

Optimal Use of Montgomery Multiplication on Smart Cards Optimal Use of Montgomery Multiplication on Smart Cards Arnaud Boscher and Robert Naciri Oberthur Card Systems SA, 71-73, rue des Hautes Pâtures, 92726 Nanterre Cedex, France {a.boscher, r.naciri}@oberthurcs.com

More information

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI G.CHANDANA 1 (M.TECH),chandana.g89@gmail.com P.RAJINI 2 (M.TECH),paddam.rajani@gmail.com Abstract Efficient modular adders and subtractors for

More information

PUBLIC-KEY cryptography (PKC), a concept introduced

PUBLIC-KEY cryptography (PKC), a concept introduced 1 Speeding Up Barrett and Montgomery Modular Multiplications Miroslav Knežević, Student Member, IEEE, Frederik Vercauteren, and Ingrid Verbauwhede, Senior Member, IEEE Abstract This paper proposes two

More information

Computing the RSA Secret Key is Deterministic Polynomial Time Equivalent to Factoring

Computing the RSA Secret Key is Deterministic Polynomial Time Equivalent to Factoring Computing the RSA Secret Key is Deterministic Polynomial Time Equivalent to Factoring Alexander May Faculty of Computer Science, Electrical Engineering and Mathematics University of Paderborn 33102 Paderborn,

More information

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) * Institute for Applied Information Processing and Communications Graz University of Technology Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) * CHES 2002 Workshop on Cryptographic Hardware and Embedded

More information

A fast modular multiplication algorithm for calculating the product AB modulo N

A fast modular multiplication algorithm for calculating the product AB modulo N Information Processing Letters 72 (1999) 77 81 A fast modular multiplication algorithm for calculating the product AB modulo N Chien-Yuan Chen a,, Chin-Chen Chang b,1 a Department of Information Engineering,

More information

Hardware Implementation of Elliptic Curve Cryptography over Binary Field

Hardware Implementation of Elliptic Curve Cryptography over Binary Field I. J. Computer Network and Information Security, 2012, 2, 1-7 Published Online March 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2012.02.01 Hardware Implementation of Elliptic Curve Cryptography

More information

Chapter 5 Arithmetic Circuits

Chapter 5 Arithmetic Circuits Chapter 5 Arithmetic Circuits SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 11, 2016 Table of Contents 1 Iterative Designs 2 Adders 3 High-Speed

More information

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2) An Algorithm for Inversion in GF2 m Suitable for Implementation Using a Polynomial Multiply Instruction on GF2 Katsuki Kobayashi, Naofumi Takagi, and Kazuyoshi Takagi Department of Information Engineering,

More information

GENERALIZED ARYABHATA REMAINDER THEOREM

GENERALIZED ARYABHATA REMAINDER THEOREM International Journal of Innovative Computing, Information and Control ICIC International c 2010 ISSN 1349-4198 Volume 6, Number 4, April 2010 pp. 1865 1871 GENERALIZED ARYABHATA REMAINDER THEOREM Chin-Chen

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

Reduce-by-Feedback: Timing resistant and DPA-aware Modular Multiplication plus: How to Break RSA by DPA

Reduce-by-Feedback: Timing resistant and DPA-aware Modular Multiplication plus: How to Break RSA by DPA Reduce-by-Feedback: Timing resistant and DPA-aware Modular Multiplication plus: How to Break RSA by DPA M. Vielhaber vielhaber@gmail.com Hochschule Bremerhaven und/y Universidad Austral de Chile CHES 2012

More information

An Area Efficient Enhanced Carry Select Adder

An Area Efficient Enhanced Carry Select Adder International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 PP.06-12 An Area Efficient Enhanced Carry Select Adder 1, Gaandla.Anusha, 2, B.ShivaKumar 1, PG

More information

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 64, NO. 3, MARCH 2017 349 A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC Honggang Qi, Member, IEEE,

More information

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California VLSI Arithmetic Lecture 9: Carry-Save and Multi-Operand Addition Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Carry-Save Addition* *from Parhami 2 June 18, 2003 Carry-Save

More information

Design and Implementation of a Low Power RSA Processor for Smartcard

Design and Implementation of a Low Power RSA Processor for Smartcard I.J.Modern Education and Computer Science, 2011, 3, 8-14 Published Online June 2011 in MECS (http://www.mecs-press.org/) esign and Implementation of a Low Power RSA Processor for Smartcard Zhen Huang Institute

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Novel Modulo 2 n +1Multipliers

Novel Modulo 2 n +1Multipliers Novel Modulo Multipliers H. T. Vergos Computer Engineering and Informatics Dept., University of Patras, 26500 Patras, Greece. vergos@ceid.upatras.gr C. Efstathiou Informatics Dept.,TEI of Athens, 12210

More information

Hardware implementations of ECC

Hardware implementations of ECC Hardware implementations of ECC The University of Electro- Communications Introduction Public- key Cryptography (PKC) The most famous PKC is RSA and ECC Used for key agreement (Diffie- Hellman), digital

More information

EECS150 - Digital Design Lecture 21 - Design Blocks

EECS150 - Digital Design Lecture 21 - Design Blocks EECS150 - Digital Design Lecture 21 - Design Blocks April 3, 2012 John Wawrzynek Spring 2012 EECS150 - Lec21-db3 Page 1 Fixed Shifters / Rotators fixed shifters hardwire the shift amount into the circuit.

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation

More information

Montgomery Multiplier and Squarer in GF(2 m )

Montgomery Multiplier and Squarer in GF(2 m ) Montgomery Multiplier and Squarer in GF( m ) Huapeng Wu The Centre for Applied Cryptographic Research Department of Combinatorics and Optimization University of Waterloo, Waterloo, Canada h3wu@cacrmathuwaterlooca

More information

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

A Low-Error Statistical Fixed-Width Multiplier and Its Applications A Low-Error Statistical Fixed-Width Multiplier and Its Applications Yuan-Ho Chen 1, Chih-Wen Lu 1, Hsin-Chen Chiang, Tsin-Yuan Chang, and Chin Hsia 3 1 Department of Engineering and System Science, National

More information

Chapter 2 Basic Arithmetic Circuits

Chapter 2 Basic Arithmetic Circuits Chapter 2 Basic Arithmetic Circuits This chapter is devoted to the description of simple circuits for the implementation of some of the arithmetic operations presented in Chap. 1. Specifically, the design

More information

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli Behrooz Parhami Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560,

More information

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 Design and Implementation of Carry Look Ahead Adder

More information

Outline. Computer Arithmetic for Cryptography in the Arith Group. LIRMM Montpellier Laboratory of Computer Science, Robotics, and Microelectronics

Outline. Computer Arithmetic for Cryptography in the Arith Group. LIRMM Montpellier Laboratory of Computer Science, Robotics, and Microelectronics Outline Computer Arithmetic for Cryptography in the Arith Group Arnaud Tisserand LIRMM, CNRS Univ. Montpellier 2 Arith Group Crypto Puces Porquerolles, April 16 18, 2007 Introduction LIRMM Laboratory Arith

More information

Power Analysis to ECC Using Differential Power between Multiplication and Squaring

Power Analysis to ECC Using Differential Power between Multiplication and Squaring Power Analysis to ECC Using Differential Power between Multiplication and Squaring Toru Akishita 1 and Tsuyoshi Takagi 2 1 Sony Corporation, Information Technologies Laboratories, Tokyo, Japan akishita@pal.arch.sony.co.jp

More information

Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20

Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20 Radix-4 Vectoring CORDIC Algorithm and Architectures J. Villalba E. Antelo J.D. Bruguera E.L. Zapata July 1998 Technical Report No: UMA-DAC-98/20 Published in: J. of VLSI Signal Processing Systems for

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Security Level of Cryptography Integer Factoring Problem (Factoring N = p 2 q) December Summary 2

Security Level of Cryptography Integer Factoring Problem (Factoring N = p 2 q) December Summary 2 Security Level of Cryptography Integer Factoring Problem (Factoring N = p 2 ) December 2001 Contents Summary 2 Detailed Evaluation 3 1 The Elliptic Curve Method 3 1.1 The ECM applied to N = p d............................

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 8, August 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient

More information

CS 140 Lecture 14 Standard Combinational Modules

CS 140 Lecture 14 Standard Combinational Modules CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier

More information

Implementation Options for Finite Field Arithmetic for Elliptic Curve Cryptosystems Christof Paar Electrical & Computer Engineering Dept. and Computer Science Dept. Worcester Polytechnic Institute Worcester,

More information

assume that the message itself is considered the RNS representation of a number, thus mapping in and out of the RNS system is not necessary. This is p

assume that the message itself is considered the RNS representation of a number, thus mapping in and out of the RNS system is not necessary. This is p Montgomery Modular Multiplication in Residue Arithmetic Jean-Claude Bajard LIRMM Montpellier, France bajardlirmm.fr Laurent-Stephane Didier Universite de Bretagne Occidentale Brest, France laurent-stephane.didieruniv-brest.fr

More information

Exponentiation and Point Multiplication. Çetin Kaya Koç Spring / 70

Exponentiation and Point Multiplication.   Çetin Kaya Koç Spring / 70 Exponentiation and Point Multiplication 1 2 3 4 5 6 8 7 10 9 12 16 14 11 13 15 20 http://koclab.org Çetin Kaya Koç Spring 2018 1 / 70 Contents Exponentiation and Point Multiplication Exponentiation and

More information

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis 2015 IEEE 22nd Symposium on Computer Arithmetic New Bit-Level Serial GF 2 m ) Multiplication Using Polynomial Basis Hayssam El-Razouk and Arash Reyhani-Masoleh Department of Electrical and Computer Engineering

More information

PARALLEL MULTIPLICATION IN F 2

PARALLEL MULTIPLICATION IN F 2 PARALLEL MULTIPLICATION IN F 2 n USING CONDENSED MATRIX REPRESENTATION Christophe Negre Équipe DALI, LP2A, Université de Perpignan avenue P Alduy, 66 000 Perpignan, France christophenegre@univ-perpfr Keywords:

More information

Svoboda-Tung Division With No Compensation

Svoboda-Tung Division With No Compensation Svoboda-Tung Division With No Compensation Luis MONTALVO (IEEE Student Member), Alain GUYOT Integrated Systems Design Group, TIMA/INPG 46, Av. Félix Viallet, 38031 Grenoble Cedex, France. E-mail: montalvo@archi.imag.fr

More information

Speeding Up Bipartite Modular Multiplication

Speeding Up Bipartite Modular Multiplication Speeding Up Bipartite Modular Multiplication Miroslav Knežević, Frederik Vercauteren, and Ingrid Verbauhede Katholieke Universiteit Leuven Department of Electrical Engineering - ESAT/SCD-COSIC and IBBT

More information

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field Khalid Javeed BEng, MEng A Disertation submitted in fulfilment of the requirements for the award of Doctor

More information

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA. GLOBAL JOURNAL OF ADVANCED ENGINEERING TECHNOLOGIES AND SCIENCES DESIGN OF A QUINARY TO RESIDUE NUMBER SYSTEM CONVERTER USING MULTI-LEVELS OF CONVERSION Hassan Amin Osseily Electrical and Electronics Department,

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

More information

Gurgen Khachatrian Martun Karapetyan

Gurgen Khachatrian Martun Karapetyan 34 International Journal Information Theories and Applications, Vol. 23, Number 1, (c) 2016 On a public key encryption algorithm based on Permutation Polynomials and performance analyses Gurgen Khachatrian

More information

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 } An Effective New CRT Based Reverse Converter for a Novel Moduli Set +1 1, +1, 1 } Edem Kwedzo Bankas, Kazeem Alagbe Gbolagade Department of Computer Science, Faculty of Mathematical Sciences, University

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS B. Venkata Sreecharan 1, C. Venkata Sudhakar 2 1 M.TECH (VLSI DESIGN)

More information

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n,

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n, Regular Paper Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n, 2 n +1) Shuangching Chen and Shugang Wei

More information

National Taiwan University Taipei, 106 Taiwan 2 Department of Computer Science and Information Engineering

National Taiwan University Taipei, 106 Taiwan 2 Department of Computer Science and Information Engineering JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 907-919 (007) Short Paper Improved Modulo ( n + 1) Multiplier for IDEA * YI-JUNG CHEN 1, DYI-RONG DUH AND YUNGHSIANG SAM HAN 1 Department of Computer Science

More information