FPGA Implementation of Pairings using Residue Number System and Lazy Reduction

Size: px
Start display at page:

Download "FPGA Implementation of Pairings using Residue Number System and Lazy Reduction"

Transcription

1 FPGA Implementation of Pairings using Residue Number System and Lazy Reduction Ray C.C. Cheung 1, Sylvain Duquesne 2, Junfeng Fan 4, Nicolas Guillermin 2,3, Ingrid Verbauhede 4, and Gavin Xiaoxu Yao 1 1 Department of Electronic Engineering City University of Hong Kong, Hong Kong SAR, r.cheung@cityu.edu.hk, gavin.yao@student.cityu.edu.hk 2 IRMAR, UMR CNRS 6625, Université Rennes 1 Campus de Beaulieu, Rennes cedex, France 3 DGA.IS, La Roche Marguerite Bruz, France sylvain.duquesne@univ-rennes1.fr, nicolas.guillermin@m4x.org 4 Katholieke Universiteit Leuven, COSIC & IBBT Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium {junfeng.fan, ingrid.verbauhede}@esat.kuleuven.be Abstract. Recently, a lot of progress has been made in the implementation of pairings in both hardare and softare. In this paper, e present to FPGA-based high speed pairing designs using the Residue Number System and lazy reduction. We sho that by combining RNS, hich is naturally suitable for parallel architectures, and lazy reduction, hich performs one reduction for multiple multiplications, the speed of pairing computation in hardare can be largely increased. The results sho that both designs achieve higher speed than previous designs. The fastest version computes an optimal ate pairing at 126-bit security level in ms, hich is 2 times faster than all previous hardare implementations at the same security level. Keyords: Optimal Pairing, Residue Number System, Lazy Reduction, FPGA 1 Introduction and Motivation Bilinear pairings on elliptic curves have been introduced in cryptography in the middle of 90 s for cryptanalysis [18, 33]. In 2000, Joux introduced the first constructive use of pairings ith a tripartite key exchange protocol [25]. In the last decade many pairing-based schemes such as identity-based encryption [10], This ork as supported in part by the European Commission s ECRYPT II NoE (ICT ), by the Belgian State s IAP program P6/26 BCRYPT, by the K.U. Leuven-BOF (OT/06/40), by the Research Council K.U. Leuven: GOA TENSE (GOA/11/007), by French ANR projects no. 07-BLAN-0248 ALGOL and 09- BLAN CHIC, and by City University of Hong Kong Start-up Grant

2 identity-based signatures [12] and short signatures [11] have been proposed and studied. Compared ith other popular public key cryptosystems, e.g. Elliptic Curve Cryptography (ECC) [30, 34] and RSA [41], pairing computation is much more complicated. For this reason, efficient implementation of cryptographic pairings has received increasing interests [3, 8, 16, 20, 23, 27, 36]. The computation of a pairing can be broken don into modular operations in the underlying fields. For example, one optimal ate pairing [44] defined on a 256-bit Barreto-Naehrig (BN) curve [7] requires around 10 4 modular multiplications [3]. Thus, having an efficient modular multiplier is the key step to a high performance pairing processor. In this ork, e are interested in hardare implementation of pairings over large characteristic fields. In this case one possible optimization is to use lazy reduction. Lazy reduction in pairing computation as introduced by Scott [42] and then generalized by Aranha et al. in [3]. In short, it performs one reduction for expressions like A i B j, here A i, B j F p. Aranha et al. have shon that lazy reduction can significantly speed up optimal ate pairings in softare [3]. As suggested by Duquesne [14], lazy reduction can be combined ith the Residue Number System (RNS) to further reduce the complexity. Besides, the RNS distributes computation over a group of small integers, and is naturally suitable for parallel implementations [22, 29]. In this paper, e propose to FPGA-based pairing processors that use RNS representation and lazy reduction. The first design, referred as Design I, is based on a general (but enhanced) RNS data-path. Design I is scalable in terms of security level and flexible in terms of target devices. On an Altera Stratix III FPGA, Design I computes a pairing at 126-bit security level in 1.07 ms. The second design, referred as Design II, has an optimized architecture for pairings over 254-bit curves. We use a set of parameters that leads to a reduced complexity and an optimized datapath that benefits from the reduction. Design II computes a pairing at 126-bit security level in ms on a Xilinx Virtex-6 FPGA. To the best of our knoledge, these are the first hardare designs of pairing using RNS, and the designs outperform all previous hardare implementations at a similar security level [16, 19, 27]. The rest of the paper is organised as follos. Section 2 and Section 3 provide the background on optimal ate pairing and RNS, respectively. In Section 4 and Section 5 e describe the architecture of Design I and Design II. Section 6 describes the control of the data-path and the high level scheduling. We give a detailed analysis of the performance in Section 7. Finally, Section 8 concludes the paper. 2 Optimal Ate Pairings 2.1 Pairings on Barreto-Naehrig Curves A bilinear pairing is a non-degenerate map from G 1 G 2 to G T hich is linear in both components. Popular pairings such as Tate pairing [6], ate pairing [24], R-ate pairing [32], optimal pairing [44] choose G 1 and G 2 to be specific cyclic subgroups of E(F p k), and G T to be a subgroup of F p k.

3 Let F p be a finite field and let E be an elliptic curve defined over F p. Let l be a large prime dividing #E(F p ) and k the embedding degree ith respect to l, namely, the smallest positive integer k such that l p k 1. Small embedding degrees can be easily obtained using supersingular curves. Hoever, it is too small (k 2) if large characteristic base fields are used. Thus, e use ordinary curves ith prescribed embedding degrees constructed via the complex multiplication method as surveyed in [17]. We focus on the most popular one to date, namely the Barreto-Naehrig curves [7]. The reason for their popularity is that they are ell-suited for 128-bit security level and they have degree 6 tist. Let u Z such that p = 36u u u 2 + 6u + 1 and l = 36u u u 2 + 6u + 1 are prime. A BN curve is an elliptic curve defined over F p by E : y 2 = x 3 + b, here b 0 such that #E = l, and it has 12 as an embedding degree. In this paper, e mainly focus on the discussion of optimal ate pairing [36] because it is the most efficient to date for BN curves. Let r = 6u + 2, an optimal ate pairing on BN curves is defined as follos [2, 36]: a opt : E(F p 12) Ker(π p p) E(F p )[l] F p 12/ ( F p 12 ) l (Q, P ) ( ) p 12 1 l f (r,q) (P ) g (rq,πp(q))(p ) g (rq+πp(q), πp 2(Q)) (P ) here π p is the Frobenius map on the curve (π p (x, y) = (x p, y p )), and g (Q1,Q 2) is the line through Q 1 and Q Pairing Computation and Parameter Selection Pairing computation The computation consists of to main functions, f (r,q) and f p12 1 l. The function f (r,q) has the folloing divisor: div(f (r,q) ) = r(q) (rq) (r 1)(O). It is normally computed using a double-and-add method (also knon as Miller s loop [35]). Concerning f p12 1 l, also knon as the final exponentiation, Koblitz and Menezes sho in [31] that it can be split in to steps due to the integer factorization p 12 1 l = ( p 6 1 ) ( p ) ( p 4 p 2 ) + 1. l The first step is poering to p 6 1 and to p This is easily obtained via cheap Frobenius computations and an inversion. The second step is poering to hich is called the hard part of the final exponentiation. p 4 p 2 +1 l

4 Parameter selection The selection of parameters has an essential impact on the security and the performance of a pairing computation. It is explained in [39] ho to generate BN curves ith nice properties. For this ork e choose to curves both ith b = 2. The first one, BN 126, is defined by u = ( ) and has already been used in [2, 39]. It ensures only 126 bits of security, but is ell suited to registers on a general-purpose CPU hen lazy reduction is used. Hoever, FPGA architectures have no such constraints, since multipliers of larger size can alays be constructed ith DSP slices. Hence, e also consider a curve BN 128 defined by u = ( ) hich ensures 128 bits of security. Finally, e propose BN 192, the BN curve defined by u = ( ). Optimal pairing defined on this curve provides 192 bits of security [38]. In the three cases, the extension fields are defined as follos: F p 2 = F p [i]/(i 2 + 1) F p 6 = F p 2[β]/(β 3 (1 + i)) F p 12 = F p 6[Γ ]/(Γ 2 β) = F p 2[γ]/(γ 6 (1 + i)) This toer of extensions has many advantages, the most important being an efficient multiplication algorithm for the canonical polynomial base. Note that BN curves alays have degree 6 tists. This means E is isomorphic over F p 12 to a curve E defined by y 2 = x 3 + b ζ, here ζ is neither a square nor a cube in F p 2. In our case e take ζ = 1 + i so that it defines both the sextic extension of F p 2 and the tist. Then e can define tisted versions of pairings on E (F p 2) E(F p )[l]. In other ords, the coordinates of Q can be ritten as (x Q ζ 1 3, y Q ζ 1 2 ) here x Q and y Q are in F p 2. For u selected as a negative integer, the optimal ate pairing for BN curves is computed by Algorithm 1 [2] here dbl, add and hard-part are given in appendix. 3 Residue Number System A Residue Number System (RNS) represents a large integer using a set of smaller integers. Let B = {b 1, b 2,..., b n } be a set of pairise co-prime integers, and M B = n i=1 b i. For any integer X, 0 X < M B, there is a unique RNS representation on B: {X} B = {x 1, x 2,..., x n }, here x i = X bi, 1 i n. Throughout the paper e use a b to denote a mod b. Given {X} B, one can recover X using the Chinese Remainder Theorem (CRT): n X = x i B 1 i B i here B i = bi MB M B. (1) b i i=1 The set B is also knon as a base, and each element b i, 1 i n, is called an RNS modulus or an RNS channel. RNS representation admits efficient parallel computations. Consider to integers X, Y and their RNS representations {X} B = {x 1, x 2,..., x n } and {Y } B = {y 1, y 2,..., y n }, then e have { X Y MB } B = { x 1 y 1 b1,..., x n y n bn }, {+,,, /}. (2)

5 Algorithm 1 Optimal ate pairing on BN curves for u < 0 Require: P E(F p)[l], Q = (x Qγ 2, y Qγ 3 ) E(F p 12) Ker(π p p) ith x Q and y Q F p 2, r = 6u + 2 = s 1 i=0 ri2i, here u < 0. Ensure: a opt(q, P ) F p 12 1: T = (X T γ 2, Y T γ 3, Z T ) (x Qγ 2, y Qγ 3, 1), f 1 2: for i = s 2 donto 0 do 3: T, g dbl(t, P ), f f 2 g 4: if r i = 1 then 5: T, g add(t, Q, P ), f f g 6: end if 7: end for 8: T T, f f p6 (f p6 is equivalent to f 1 as noticed in [3]) 9: Q 1 π p(q), Q 2 π p(q 1) 10: T, g add(t, Q 1, P ), f f g 11: T, g add(t, Q 2, P ), f f g ( ) p : f f p6 1 13: f hard-part(f, u ) 14: return f Note that the division is available only if Y is co-prime ith M B. For all these operations, computations beteen x i and y i have no dependency on other channels, hich makes RNS naturally suitable for parallel implementations. The computation of X bi is called a channel reduction. To accelerate this operation, pseudo-mersenne numbers of the form b i = 2 ε i, here ε i < 2 /2, are typically selected as RNS moduli. Hence, the computation of X bi is performed using 2 times of X X/2 ε i + (X mod 2 ), and a correction step in the end to bring the result back to the range [0, b i ). 3.1 RNS Montgomery Reduction Using RNS representation ensures efficient computation in Z/M B Z. Unfortunately, it can t be applied directly in F p since M B is not prime. One ay to utilize RNS for field multiplication is to combine RNS and Montgomery reduction [22, 29]. This is shon in Algorithm 2. RNS Montgomery reduction requires to bases, B and C, ith M C co-prime to M B. The reason of including C is that division by M B is not possible in B. Note that the size of M B and M C, compared to p, determine the upper bound of input X. Guillermin found that if X < αp 2, M B > αp and M C > 2p, then Algorithm 2 has output S < 2p [22, Proposition 1]. This is an important principle for base selection. 3.2 Base Extension The operation to transform the representation in one RNS base to another base is called Base Extension (BE). To compute {X} C = {x 1, x 2,..., x n} from {X} B =

6 Algorithm 2 RNS Montgomery reduction [5] Require: RNS bases B and C ith M B > αp, M C > 2p, p coprime ith M B M C, {X} B and {X} C being the RNS representations of X < αp 2. Precomputed: { p 1 MB } B, { M 1 B M } C C and {p} C. Ensure: {S} B, {S} C such that S p = XM 1 B p and S < 2p 1: {Q} B {X} B { p 1 } B 2: {Q} B Base Extension {Q} C 3: {S} C ( {X} C + {Q} C {p} C ) {M 1 B 4: {S} B Base Extension {S} C } C {x 1, x 2,..., x n }, one can use the Posch-Posch method [40]. Given {X} B, for (1), there must exist an integer λ < n such that: n X = x i B 1 n n i B i = ξ i B i = ξ i B i λ M B (3) bi MB MB i=1 i i=1 here ξ i = x i B 1, 1 i n. In the Posch-Posch method, λ can be bi calculated by the folloing equation: n λ = i=1 ξ i B i n = M B i=1 In [29], ξ i /b i is further approximated by ξ i /2 as b i is chosen as a pseudo- Mersenne number near 2. Once λ is obtained, {X} C = {x 1,..., x n} can be computed as follos: n n x j = ξ i B i λ M B = ξ i B i cj λ M B cj. (5) cj cj i=1 B i cj and M B cj, 1 i, j n, can be precomputed once B and C are fixed. The folloing algorithm describes the computation of X ck. i=1 ξ i b i i=1 (4) 4 Design I: A Scalable Architecture In this section e propose an enhancement of the Cox-Roer architecture first proposed in [29] such that it is suitable for pairing computation. 4.1 Cox-Roer Architecture The Cox-Roer architecture as first proposed by Kaamura et al. in [29]. It as first implemented in a VLSI design of an RSA cryptosystem [37]. It as later enhanced by Guillermin [22] to support all arithmetic operations in F p and fast Elliptic Curve Scalar Multiplications (ECSM).

7 Algorithm 3 Base extension algorithm for k-th element of X [29] Require: X bi for i {1,.., n} Ensure: X ck Precomputed: B 1 i bi, B i ck, 1 i n, M B ck 1: ξ i x i B 1 i bi for i {1,.., n} 2: ψ 0, z 0 3: for i = 0 to (n 1) do 4: ψ ψ + ξ i 5: ρ ψ/2 //ρ {0, 1} 6: ψ ψ 2 7: z z + (ξ i B i ck ) ρ M B ck 8: end for 9: return z Fig. 1 shos our Cox-Roer implementation. The top-level structure is similar to the one proposed by [29]. It is consists of n similar Roer units performing in parallel operations in one RNS channel of B or C. The Cox unit is only used during the base extension (Algorithm 3) to generate the ρ. The Roer unit normally consists of a multiplier and channel reduction logic, and serves as the orkhorse of both multiplication (operation in (2)) and reduction (step 1, 3 in Algorithm 2 and step 1, 7 in Algorithm 3). The Cox-Roer is driven by a microcoded sequencer, and the sequencer can be easily reprogrammed to provide support for different algorithms. While the basic structure of our design resembles the Cox-Roer architecture in Guillermin s ECC processor [22], our architecture has an optimized pipeline structure and a more aggressive memory organization specifically designed for pairing support. 4.2 Cox-Roer Parametrization for Pairing First e need to parametrize the Cox-Roer to support the pairings on the three curves defined in Section 2.2. Popular Altera and Xilinx FPGAs have embedded multipliers or even multipliers (Virtex-5 and higher). Moreover, a full multiplier can be built by efficiently combining several such small multipliers. It is thus natural to select =18 or =36. For the implementation of BN 126 or BN 128, =36 gives the best trade-off. Indeed the gain in frequency brought by smaller multipliers and adders does not compensate the necessary cycle surplus of reductions. The parameter n can then be set to 8 for BN 126 and BN 128, and 19 for BN 192. Because of our chosen arithmetic (see Section 6), the value α defined in Section 3.1 reaches 198 for all the curves. The orst case is reached during the schoolbook multiplication in F p 12. One can verify that, in order to have M B > αp, =33 is enough to support BN 126 and =34 for both BN 126 and BN 128. For BN 192, is set to 35. In this architecture, e used the same method to select bases as in [22], ith pseudo-mersennes of the form t i = 2 ε i, t i B C and ε i positive.

8 An adaptation of Guillermin s architecture is necessary to provide pairing support. As the number of local variables and precomputations is much larger for pairings than that of ECSM (in fact, the Miller s loop has a built-in ECSM), e use a single triple port RAM of 256 ords instead of the ROM and a group of 16 registers to store precomputed values and temporary results. This is enough to support all curves listed in Section 2.2. Fig. 1. Design I: architecture of the Cox-Roer and its pipeline structure command sequencer in main bus out RAM 2 ti q Cox Ro1 Ro2 Ron main MUX q + 1 RAM q + 1 coxi main bus sequencer Triple port RAM 2q + 1 main bus ALU RAM inv main MUX register modular adder MUX modular subtracter unsigned multiplier 0 or 1 bit shifter

9 4.3 Pipeline Architecture Our goal is to keep the maximal frequency already available in [22]. Additive hardare must be carefully introduced, to keep the critical path under control. On the other hand, e can easily raise the pipeline depth ithout a lot of cycle loss. Indeed, pairing computation has more parallelizable operations than classical elliptic curve scalar multiplication over F p. The architecture in [37] and [22] uses 2-stage and 5-stage pipelines, respectively. For the implementation of pairings, e found that a pipeline of up to 10 stages can still be efficiently filled during the hole pairing computation except for F p inversion. Based on Guillermin s architecture, to accumulators are included in the pipeline to efficiently support pairing computation. Fig. 1 shos the adapted pipeline architecture. The first 4 stages perform a b ti, here t i B C. We also introduce shift and MUX units in the first 4 stages to compute 2 a b ti, hich are utilized to accelerate squarings in the extension fields. Three multipliers (, (q+1) and (q+1) q) are used, here q= log 2 ε i. In the 6 th stage e implemented to independent accumulators. They are preceded by a subtracter (hich computes a b ti ) and a MUX in the 5 th stage. The output of the first 4 stages can then be independently added, subtracted or ignored by both accumulators. These to accumulators, together ith the use of toer extensions, save a lot of cycles during the pairing computation. See Section 6 for details. On this architecture, an RNS multiplication costs 2 cycles and the results can be accumulated immediately. An RNS reduction costs 2n+3 cycles as previously described in [22]. 5 Design II: Hardare/Algorithm Co-optimization In this section, e propose an optimized design that achieves an even higher throughput. The improvement comes mainly from to tricks: a set of good bases that admits a less-expensive base extension, and a fine-tuned pipeline structure that allos a higher frequency. 5.1 Base Selection Revisited The core observation here is that the complexity of base extension can be reduced if the moduli in the to bases are close to each other. The base extension is the most computational expensive operation in the RNS Montgomery algorithm. It requires n 2 times of multiplications. Indeed, (5) can be ritten as a matrix multiplication belo. x 1. x n B 1 c1 B n c B 1 cn B n cn ξ 1. ξ n M B c1 γ. M B cn (6)

10 Note that the elements in the matrix, B i cj, 1 i, j n, are constants and are generated as follos: B i cj = n k=1,k i b k = cj n k=1,k i (b k c j ) (7) cj Define B i,j := n k=1,k i (b k c j ). When b k and c j are close to each other, the difference b k c j is small. In practice, B i,j could be much smaller than c j if n is relatively small. Note that using B i cj or B i,j makes no difference in the final results due to the channel reduction on the products. Furthermore, Bi,j for 1 i, j n ill be predictably divisible by 2 n 2 if c j is odd. Consider the to bases B = {b 1, b 2,, b n } and C = {c 1, c 2,, c n }. Since there is at most one even number in B C, (b i c j ) is divided by 2 unless b i or c j is even. In practice, Bi,j can have more than n 2 zero bits in the least significant bits (LSBs). To shrink the operand size of the matrix multiplications, e can use truncate the least significant zeros of Bi,j, and restore the correct results by a simple left-shift. We denote B i,j the B i,j after truncation. Considering the size of p and the Cox-Roer architecture, e again select n = 8 and thus b i (and c j ) close to The selection of moduli also takes into account the size of B i,j for 1 i, j n. The folloing 16 moduli ( = 33) ere selected as the bases. B = {2 1, 2 9, 2 + 3, , 2 + 5, 2 + 9, 2 31, }, C = {2, 2 + 1, 2 3, , 2 13, 2 21, 2 25, 2 33}. After applying the truncation of zero bits, e manage to reduce the bitlength of all B i,j (actually, B i,j ) from standard 34 to 25. While this complexity reduction seems negligible, it admits notable savings in hardare design. A detailed analysis of the base selection is given in Appendix A A Fine-tuned Roer for Pairing Computation Our refinement focuses solely on the design of Roers. Fig. 2(c) shos the overvie of a Roer. It consists of a dual mode multiplier, 3 accumulators, a channel reduction module, a channel adder, a triple port RAM for multiplier inputs and a RAM for adder inputs. Dual mode multiplier The multiplier in the Roer is built to support the bases selected in Section 5.1. There are to types of multiplication executed in an RNS multiplication: multiplication (step 1, 3 of Algorithm 2) and multiplication (step 1, 7 of Algorithm 3). The DSP slices in recent Xilinx FPGAs are made of a signed bit multiplier, a 17-bit left shifter, and an accumulator. The dual mode multiplier is built ith four DSP slices, hich supports either unsigned multiplication or to signed multiplications in parallel. Fig. 2(a) illustrates the structure of the dual mode multiplier.

11 Fig. 2. The fine-tuned Roer architecture B[33:17] or B' i,j B[33:0] or B' i,j B' i,j+4 A[33:0] or ξ j B[16:0] or B' i,j+4 A[33:0] ξ j ξ j+4 A[33:0] or ξ j+4 ρ {0, 1, 2} B' i,j ξ j + B' i,j+4 ξ j+4 { M B ci, M C bi } AB[69:0] <<bias cj Acc0 Acc1 Acc2 +/- +/- <<17 {1, 2, 3, 6} AB[69:0] B' i,j ξ j + B' i,j+4 ξ j+4 (a) Dual mode multiplier [73:0] (b) Accumulators RAM0 Dual Mode Multiplier DSP DSP DSP DSP ξ j ξ j+4 [73:0] [73:33] [32:0] ε i 25x35 multiplier ith 2-stage pipeline realized by 2 DSP48E1s 35x25 35x25 [48:33] Adder logic ε i [32:0] Constant multiplier realized by adders RAM1 Accumulators Acc0 Acc1 Acc2 ρ t i +/- [33:0] Adder Subtracter Channel Adder Channel Reduction ξ i [33:0] Adder/Subtracter (c) Overvie of Roer i (d) Channel reduction For pairing computation, multiplication by small constants (2, 3, 6) is idely employed. Therefore, a constant multiplier is also included in the pipeline. Because to multiplications are executed simultaneously in the base extension, the number of cycles to perform the Cox-Roer algorithm (Algorithm 3) is reduced from n to n/2. This is a significant speedup of the base extension. Other optimizations The architecture of channel reduction is shon in Fig. 2(d). As the Hamming eights of all the ε i are either equal to or less than 3 in nonadjacent form, multiplication by ε i can be realized by 4 adders instead of multipliers. In order to maintain a high operating frequency, e use a three-stage pipeline to realize the channel reduction. To achieve the maximum usage of the multiplier, a separate adder together ith a dedicated RAM is included. Because of the lazy reduction inside the Roer, the rite port of RAM0 is not alays occupied by the multiplier. Therefore, the addition and the multiplication can run in parallel. While most of the

12 additions in the pairing computation are performed by the accumulators, the separate adder is useful in point additions and doublings. While this architecture requires less cycles for reduction than Design I, it is less scalable. Indeed, in order to support larger p, e need to choose larger n. The bitlength of B i,j increases quickly hen n goes up, and to multiplications in parallel becomes impossible on the current Roer. 6 Scheduling the Pairing Algorithm In this section e present the implementation choices for every step of the pairing computation. 6.1 Arithmetic in F p 2: Back to the Schoolbook Method Karatsuba and derived interpolation methods have been used intensively in field operations in the literature to save the expensive multiplication [3, 8, 23, 42]. Karatsuba uses 3 instead of 4 multiplications at the cost of 3 extra additions. In a normal positional number system, this method saves computation poer, as multiplications are much more expensive than additions. Hoever, in RNS, the complexities of a multiplication and an addition are the same. Hence, the schoolbook method involves less operations (counting both additions and multiplications) and is preferred. On both architectures, a full F p 2 multiplication finishes in 8 cycles, and a squaring in only 6 cycles. 6.2 Arithmetic in F p 12: Interpolation ith Parsimony Let X = {x 1,, x 12 }, Y = {y 1,, y 12 } and Z = X Y F p 12. To accelerate F p 12 arithmetic, e aim to execute only once x i y j, for all {i, j} {1,, 12} 2. As i 2 = 1 and γ 6 =(1 + i), the result of x i y j is either added to or subtracted from at most to components of Z. This is the reason for the presence of to independent accumulators. Moreover, the result of the multiplication can be multiplied by 2 on the fly before it is accumulated, hich speeds up the squaring in F p 12. We estimate if the interpolation techniques at higher levels may save cycles or not. We exclude the reduction step, therefore the cycle count is equivalent on the flexible as ell as the optimized design. We only consider Karatsuba in F p 12 over F p 6 hich is the best case for these techniques (interpolation at other levels F p 12/F p 4, F p 6/F p 2 or F p 4/F p 2 ould give orse results). Four different types of multiplications and squaring are needed during pairing, each one requiring a different algorithm: Squaring during the Miller loop: the operand of this squaring has no specific structure. We propose to use schoolbook squaring. Thanks to the multiplication by 2 included in the pipeline of both Design I and Design II, a squaring costs 156 cycles. The interpolation on F p 12/F p 6 leads to the folloing

13 formula: (X H + Γ X L ) 2 = ( (X H + X L )(X H + Γ 2 X L ) (1 + Γ 2 )X H X L ) + Γ (2X H X L ) (ith X H, X L F p 6). We found no method to go belo 156 cycles on both designs. Multiplication by the line in the Miller loop: half the values of B are equal to 0, therefore the schoolbook multiplication costs 144 cycles. Interpolation techniques do not manage to take the advantage of the half size operand. Squaring during final exponentiation: during the hard part of FE, the operands to be squared are in G Φ6 (F p 2). We use the formulae given in [21] (Algorithm A.1). On Design I, e need 84 cycles to implement it. This approach is much more efficient than interpolation techniques. 5 Multiplication during final exponentiation: Without counting pipeline bubbles, Karatsuba requires 278 and 254 cycles on Design I and Design II, respectively, hile the schoolbook method requires 288 on both. Note that there are only 20 such multiplications in the hole pairing for BN 126, and the savings are really limited. Because of the moderate gain and the complication brought to the sequencer, e decided not to implement it. 6.3 F p Inversion In the final exponentiation of pairing, an F p 12 inversion is required. It can be done ith only one inversion, 35 reductions and additional multiplications/additions in F p [14, 23], but the remaining inversion in F p is very expensive. Since comparison in RNS is difficult, inversion through exponentiation (X 1 X p 2 mod p) is used. For this operation, e use a simple square and multiply algorithm ith least significant bit first. It is more memory consuming, but it allos to perform multiplications in parallel. On a pipelined datapath, LSB-first exponentiation is more efficient than the MSB-first method. In total, an inversion needs log(p) 1 squarings and many multiplications, but the cost of multiplications is hidden in the pipeline. 6.4 Higher Level Scheduling Based on the operation over F p and its extension fields, e are ready to implement the hole pairing computation. The folloing to points are important to efficiently utilize the datapath and to have limited memory usage: control of the number of local variables to limit the size of RAMs; control of the dependency beteen operations, to avoid pipeline bubbles. For the first point, the step hich requires the most live local variables is in the hard-part of the final exponentiation. We use the algorithm described by Scott et al. [43], hich is to date the fastest ay to compute the hard-part (Algorithm A.1). To keep its full poer hile limiting the number of local variables 5 More recently, Karabina introduced a compressed form for elements in F p 12 hich require less operations to be squared [3, 28]. Unfortunately, this method involves extra inversions so that it is not suitable for our designs.

14 (ith a classical register allocation technique) e slightly rearranged their original formulae. For the second point, excluding the F p inversion here idle states cannot be avoided, the most constrained part is the F p 2 arithmetic in the Miller loop. Therefore e rearranged the projective coordinates addition and doubling formulae to emphasize the inherent parallelism. Formulas can be found in Algorithm 4 and 5 in appendix. On both architectures, e managed to eliminate almost all pipeline bubbles in the Miller loop. Idle states on the multiplier remain less than %1 of the time on both architectures in the pairing computation, excluding the F p inversion. 7 Implementation Results and Analysis 7.1 Area The prototype of the proposed pairing coprocessors ere implemented on commercial FPGAs. Table 1 gives the logic utilization of both designs. Design I We synthesized the first design for n = 8 on three different FPGAs : EP2C35: A lo cost (less than $100) 65 nm node Altera FPGA. It is designed for industrial series production; EP2S30: A 65 nm node high end series of Altera. It is picked for the sake of comparison ith the Xilinx Virtex-4 used in [15]. EP3SE50: A 40 nm node high end FPGA. Note that it is the smallest FPGA of the series. BN 192 (n = 19) is also implemented on this device. The area consumption is given in ALM, the equivalence of the Xilinx Virtex-4 slice, and in LE for the Cyclone, hich can be considered as a simple 4 4 LUT ith carry chain and registers. We refer the reader to [1] for details. We let the synthesizer decide ho to implement multiplications (ith speed constraints). Note that this choice lead to the maximal use of DSP blocks. We used also embedded RAMs: the RAM of each Roer is built the M4k or M9k (depending on the FPGA). We gave no instructions to the design suite for the implementation of the sequencer (containing 20 kb microcode), therefore they ere placed in the big available memories. The same design provides support for the to curves defined in 2.2, ith the only difference being the content of the RAM blocks (precomputed values and the sequencer). Design II Design II is implemented on a Xilinx Virtex-6 XC6VLX240T-2 FPGA, hich embeds DSP slices. As there are 8 Roers and each Roer contains 4 DSP slices, the total number of DSPs is 32. Data RAMs used in each Roer are implemented ith distributed memory blocks. The block RAMs (BRAMs) serve as the microcode sequencer. Thanks to the fine-tuned pipeline, the coprocessor can operate at 250MHz.

15 Table 1. Logic Utilization n Device Freq. Multipliers Logic Data Sequencer Elements Memory Cyclone II 91 MHz bit Mult LE 32 M4k 35 M4k 8 Stratix II 165 MHz 72 DSP18el 4227 ALMs 32 M4k 1 M512 Design I Stratix III 165 MHz 72 DSP18el 4233 ALMs 16 M9k 1 M M9k 19 Stratix III 131 MHz 171 DSP18el 9910 ALMs 38 M9k 1 M M9k Design II 8 Virtex MHz 32 DSP48E1s 7032 Slices Kb BRAMs 7.2 Performance Table 2 gives number of cycles used by the sub-functions used in optimal ate pairing on both Design I and Design II. Due to the carefully selected bases and dual mode multiplier, Design II achieved a loer cycle count. The improvement mainly comes from the speedup of RNS reduction: 19 cycles on Design I compared to 12 cycles on Design II hen n = Comparison and Discussion Table 3 lists the performance of softare and hardare implementations reported in recent literature. Both Design I and II achieve a better performance than previous hardare implementations [16, 19, 27]. Due to the use of different platforms, a fair comparison is difficult. Nevertheless, compared ith the design of [16] hich also uses Virtex-6, Design II achieves a speedup of factor 2. The softare implementations achieve very high performance. Although e did not break the softare record, the speed of our design is already close to that of softare. The speedup comes mainly from three improvements. First, RNS multiplication has loer complexity than traditional integer multiplications. For example, an 256-bit multiplication on RNS involves multiplications, hile a 256- bit integer multiplication requires multiplications using Karatsuba s method. Second, the use of lazy reduction reduces the cost of multiplications in extension fields. Third, RNS representation is very parallelization-friendly. Indeed, it only involves relatively small numbers (no long carry propagation) and alays uses data in the local memory (no inter-core communication overhead). The performance of both Design I and II has demonstrated the efficiency of RNS in pairing implementations, and confirms ith actual implementations on FPGA the analysis of [14]. Table 2. Cycle count in one optimal pairing Curve Mul./Red. 2T and T+Q and f 2 f g Miller s Final Total in F p g (T,T ) (P ) g (T,Q) (P ) Loop Exp. BN ,530 89, ,111 2 / Design I BN ,480 94, ,502 BN / , , ,849 Design II BN / ,116 81, ,111

16 Table 3. Performance comparison of softare and hardare implementations of pairings Design Pairing Security Platform Algorithm Area Freq. Cycle Delay [bit] [MHz] [ 10 3 ] [ms] Altera FPGA LE (Cyclone II) 35 mult. 126 Altera FPGA RNS 4233 A Design I optimal ate (Stratix III) Montgomery 72 DSPs 192 Altera FPGA 9910 A (Stratix III) 171 DSPs Design II optimal ate 126 Xilinx FPGA RNS 7032 slices (Virtex-6) Montgomery 32 DSPs [16] ate Xilinx FPGA Hybrid 4014 slices optimal ate (Virtex-6) Montgomery 42 DSPs Tate 1, [19] ate 128 Xilinx FPGA Blakley 52k Slices 50 1, optimal ate (Virtex-4) Tate 11, [27] ate 128 ASIC Montgomery 97 kgates 338 7, optimal ate (130 nm) 5, [15] Tate over Xilinx FPGA 4755 Slices 128 F (Virtex-4) - 7 BRAMs [2] optimal Eta Xilinx over F Virtex-4 Slices [23] ate - 15, bit Core2 Montgomery 2400 optimal ate 10, [20] ate bit Core2 Montgomery , [36] optimal ate 128 Core2 Quad Hybrid Mult , [8] optimal ate 126 Core i7 Montgomery , [3] optimal ate 126 Phenom II Montgomery , [4] η T over F Xeon , [9] η T over F Core i , [2] opt. Eta F Core i , Estimated by the authors. 8 Conclusions In this paper, e demonstrated that RNS together ith lazy reduction is a really competitive approach for pairing computation in hardare. Thanks to the use of RNS, both datapath and memory can be nicely parallelized. The results sho that both designs e proposed achieve higher speed than all previous designs in hardare. The fastest version computes an optimal ate pairing at 126-bit security level in ms, hich is 2 times faster than all previous hardare implementations. Moreover, e also reported the first hardare pairing implementation at 192-bit security level. For future ork, e ould like to further optimize the pipeline structure to achieve higher speed, particularly for 192-bit optimal pairing. We ould like implement using RNS a pairing processor that provides 256-bit security. For these levels, BN curves may become less competitive, and have to be compared to other approaches such like KSS curves [26]. Also, as shon in Section 5, carefully selected RNS bases can help to reduce the complexity of RNS base extension. Hoever, it is not clear ho much e can benefit from it hen the

17 number of channels is relatively large, and this is definitely part of the design exploration for the implementation of pairings at higher security levels. References 1. Altera eb site D. Aranha, J.-L. Beuchat, J. Detrey, and N. Estibals. Optimal eta pairing on supersingular genus-2 binary hyperelliptic curves. Cryptology eprint Archive, Report 2010/559, D. Aranha, K. Karabina, P. Longa, C. H. Gebotys, and J. López. Faster explicit formulas for computing pairings over ordinary curves. In Advances in Cryptology EUROCRYPT 2011, volume 6632 of LNCS, pages Springer, Heidelberg, D. Aranha, J. López, and D. Hankerson. High-speed parallel softare implementation of the η T pairing. In Topics in Cryptology - CT-RSA 2010, volume 5985 of LNCS, pages Springer, Heidelberg, J.-C. Bajard, L.-S. Didier, and P. Kornerup. An RNS Montgomery modular multiplication algorithm. Computers, IEEE Transactions on, 47(7): , P. Barreto, H. Kim, B. Lynn, and M. Scott. Efficient algorithms for pairing-based cryptosystems. In Advances in Cryptology CRYPTO 2002, volume 2442 of LNCS, pages Springer, Heidelberg, P. Barreto and M. Naehrig. Pairing-friendly elliptic curves of prime order. In Selected areas in cryptography SAC 2005, volume 3897 of LNCS, pages Springer, J.-L. Beuchat, J. González-Díaz, S. Mitsunari, E. Okamoto, F. Rodríguez- Henríquez, and T. Teruya. High-speed softare implementation of the optimal ate pairing over Barreto-Naehrig curves. In Pairing-Based Cryptography - Pairing 2010, volume 6487 of LNCS, pages Springer, Heidelberg, J.-L. Beuchat, E. López-Trejo, L. Martínez-Ramos, S. Mitsunari, and F. Rodríguez- Henríquez. Multi-core implementation of the Tate pairing over supersingular elliptic curves. In Cryptology and Netork Security, volume 5888 of LNCS, pages Springer, Heidelberg, D. Boneh and M. Franklin. Identity-based encryption from the Weil pairing. In Advances in Cryptology CRYPTO 2001, volume 2139 of LNCS, pages Springer, Heidelberg, D. Boneh, B. Lynn, and H. Shacham. Short signatures from the Weil pairing. Journal of Cryptology, 17(4): , J. C. Cha and J. H. Cheon. An identity-based signature from gap Diffie-Hellman groups. In International Workshop on Theory and Practice in Public Key Cryptography PKC 2003, volume 2567, pages 18 30, Springer, Heidelberg, C. Costello, T. Lange, and M. Naehrig. Faster pairing computations on curves ith high-degree tists. In Public Key Cryptography PKC 2010, volume 6056 of LNCS, pages Springer, Heidelberg, S. Duquesne. RNS arithmetic in F p k and application to fast pairing computation. Cryptology eprint Archive, Report 2010/555, to appear in Journal of Mathematical Cryptology. 15. N. Estibals. Compact hardare for computing the Tate pairing over 128-bitsecurity supersingular curves. In Pairing-Based Cryptography - Pairing 2010, volume 6487 of LNCS, pages Springer, Heidelberg, 2010.

18 16. J. Fan, F. Vercauteren, and I. Verbauhede. Efficient hardare implementation of F p-arithmetic for pairing-friendly curves. Computers, IEEE Transactions on, PP(99):1, D. Freeman, M. Scott, and E. Teske. A taxonomy of pairing-friendly elliptic curves. Journal of Cryptology, 23: , G. Frey and H.G. Rück. A remark concerning m-divisibility and the discrete logarithm in the divisor class group of curves. Mathematics of computation, 62(206): , S. Ghosh, D. Mukhopadhyay, and D. Roychodhury. High speed flexible pairing cryptoprocessor on FPGA platform. In Pairing-Based Cryptography - Pairing 2010, volume 6487 of LNCS, pages Springer, Heidelberg, P. Grabher, J. Großschädl, and D. Page. On softare parallel implementation of cryptographic pairings. In Selected Areas in Cryptography, volume 5381 of LNCS, pages Springer, Heidelberg, R. Granger and M. Scott. Faster squaring in the cyclotomic subgroup of sixth degree extensions. Public Key Cryptography PKC 2010, 6056: , N. Guillermin. A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p. In Cryptographic Hardare and Embedded Systems-CHES 2010, volume 6225 of LNCS, pages 48 64, Springer, Heidelberg, D. Hankerson, A. Menezes, and M. Scott. Softare Implementation of Pairings, volume 2 of Cryptology and Information Security Series, pages IOS Press, M. Joye and G. Neven edition, F. Hess, N.P. Smart, and F. Vercauteren. The Eta pairing revisited. Information Theory, IEEE Transactions on, 52(10): , A. Joux. A one round protocol for tripartite Diffie-Hellman. Journal of Cryptology, 17: , E. J. Kachisa, E. F. Schaefer, and M. Scott. Constructing brezing-eng pairingfriendly elliptic curves using elements in the cyclotomic field. In Pairing-Based Cryptography - Pairing 2008, volume 5209 of LNCS, pages , Springer, Heidelberg, D. Kammler, D. Zhang, P. Schabe, H. Scharaechter, M. Langenberg, D. Auras, G. Ascheid, and R. Mathar. Designing an ASIP for cryptographic pairings over Barreto-Naehrig curves. In Cryptographic Hardare and Embedded Systems-CHES 2009, volume 5747 of LNCS, pages Springer, Heidelberg, K. Karabina. Squaring in cyclotomic subgroups. Cryptology eprint Archive, Report 2010/542, S. Kaamura, M. Koike, F. Sano, and A. Shimbo. Cox-roer architecture for fast parallel montgomery multiplication. In Advances in Cryptology EUROCRYPT 2000, volume 1807 of LNCS, pages Springer, Heidelberg, N. Koblitz. Elliptic Curve Cryptosystem. Math. Comp., 48: , N. Koblitz and A. Menezes. Pairing-based cryptography at high security levels. Cryptography and coding, 3796:13 36, E. Lee, H.-S. Lee, and C.-M. Park. Efficient and generalized pairing computation on abelian varieties. Information Theory, IEEE Transactions on, 55(4): , A.J. Menezes, T. Okamoto, and S.A. Vanstone. Reducing elliptic curve logarithms to logarithms in a finite field. Information Theory, IEEE Transactions on, 39(5): , V. Miller. Uses of Elliptic Curves in Cryptography. In Advances in Cryptology: Proceedings of CRYPTO 85, volume 218 of LNCS, pages Springer, Heidelberg, 1985.

19 35. V. S. Miller. The Weil pairing, and its efficient calculation. Journal of Cryptology, 17: , /s M. Naehrig, R. Niederhagen, and P. Schabe. Ne softare speed records for cryptographic pairings. In Progress in Cryptology LATINCRYPT 2010, volume 6212 of LNCS, pages Springer, Heidelberg, H. Nozaki, M. Motoyama, A. Shimbo, and S. Kaamura. Implementation of RSA algorithm based on RNS montgomery multiplication. In Cryptographic Hardare and Embedded Systems-CHES 2001, volume 2162 of LNCS, pages , Springer, Heidelberg, National Institute of Standard and technology. Key management, management.html. 39. G.C.C.F. Pereira, M.A. Simplício, M. Naehrig, and P.S.L.M. Barreto. A family of implementation-friendly BN elliptic curves. Journal of Systems and Softare, K. Posch and R. Posch. Base extension using a convolution sum in residue number systems. Computing, 50:93 104, R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM, 21(2): , M. Scott. Implementing cryptographic pairings. In Pairing-Based Cryptography - Pairing 2007, volume 4575 of LNCS, pages Springer, Heidelberg, M. Scott, N. Benger, M. Charlemagne, L. Dominguez Perez, and E. Kachisa. On the final exponentiation for calculating pairings on ordinary elliptic curves. Pairing- Based Cryptography Pairing 2009, volume 5671 of LNCS, pages 78 88, Springer, Heidelberg, F. Vercauteren. Optimal pairings. Information Theory, IEEE Transactions on, 56(1): , A Appendix A.1 Sub-routines for Optimal Ate Pairing Algorithm 4 dbl, doubling step Require: T = (X T γ 2, Y T γ 3, Z T ) E(F p 12) ith X T, Y T and Z T F p 2, P = (x P, y P ) E(F p). Ensure: The point 2T and the evaluation in P of the equation of the tangent line in T to the curve up to multiplicative factors in F p 2. 1: B Y 2 T, C 3Z 2 T, D 2X T Y T 2: F B + 3iC, G B 3iC, H 3C, t 3 B + ic, A X 2 T, E 2Y T Z Y 3: X 2T DF, Y 2T G 2 + 4HC, Z 2T 4BE, t 0 Ey P, t 1 3Ax P 4: return (X 2T γ 2, Y 2T γ 3, Z 2T ), t 0 + t 1γ + t 3γ 3 These algorithms are specific to the BN curve y 2 = x and the toer of extensions given in Subsection 2.1. Jacobian coordinates are usually used for pairing computations [8, 23, 36] but projective coordinates are more interesting

20 in our case [3, 13]. Using the degree 6 tist on the curve, the point Q can be ritten as (x Q γ 2, y Q γ 3 ) ith x Q and y Q F p 2. The doubling step of the Miller loop consists of to stages: the doubling of a temporary projective point T = (X T γ 2, Y T γ 3, Z T ) ith X T, Y T and Z T F p 2 and the evaluation in P of the tangent line in T to the curve. This is given in Algorithm 4 here the classical formulae are rearranged in a ay to highlight the reductions (every temporary result needs a reduction except F, G, H and t 3 ), and the inherent parallelism in the local variables (each line can be implemented in random order). This is important to avoid idle states in Cox-Roer. The total cost of this step is 4 multiplications, 5 squarings, 8 reductions in F p 2 and 2 modular multiplications of an element of F p 2 by an element of F p. Note that multiplications like 2X T Y T can be transformed into squaring of (X T + Y T ) at the cost of some extra additions [3, 13], thus it is not interesting for our design. In the same ay, the cost of the addition step given by Algorithm 5 is 11 multiplications, 2 squarings, 11 reductions in F p 2 and 2 modular multiplications of an element of F p 2 by an element of F p. Algorithm 5 add, addition step Require: T = (X T γ 2, Y T γ 3, Z T ) E(F p 12) ith X T, Y T and Z T F p 2, Q = (x Qγ 2, y Qγ 3 ) E(F p 12), P = (x P, y P ) E(F p). Ensure: The point T + Q and the evaluation in P of the equation of the line passing through T and Q up to multiplicative factors in F p 2. 1: E x QZ T X T, F y QZ T Y T 2: E 2 E 2, F 2 F 2 3: A F 2Z T 2X T E 2 EE 2, B X T E 2, E 3 EE 2 4: X T +Q AE, Z T +Q Z T E 3, t 3 F x Q Ey Q 5: Y T +Q F (B A) y QE 3, t 0 Ey P, t 1 F x P 6: return (X T +Qγ 2, Y T +Qγ 3, Z T +Q), t 0 + t 1γ + t 3γ 3 Algorithm 6 hard-part, hard part of the final exponentiation according [43] Require: f F p 12 of order p 4 p 2 + 1, x = u. Ensure: f (p4 p 2 +1)/l ith p and l as in 2.1. {Computation of the y i} ( ( 1: y 0 f p f p2 f p3, y 1 f x, y 3 y1 x, y 5 y3 x, y 4 y p 5, y6 y4y5 = f x3 f x3) p) (f x2) p) 2: y 5 y p 3, y2 y 1 5 (=, y4 y1y2 f x / ( ( 3: y 2 y p 5 = f x2) p 2), y 5 y 1 3 ( = 1/f x2), y 3 y p 1 ((mx ) p ), y 1 f 1 {Multi-addition chain for computing y 0.y 2 1.y 6 2.y 12 3.y 18 4.y 30 5.y 36 6 } 4: t 0 y 2 6, t 0 t 0y 4, t 0 t 0y 5, t 1 y 3y 5, t 1 t 1t 0, t 0 t 0y 2, t 1 t 2 1 5: t 1 t 1t 0, t 1 t 2 1, t 0 t 1y 1, t 1 t 1y 0, t 0 t 2 0, t 0 t 0t 1 6: return t 0

21 Algorithm 7 Squaring during final exponentiation (hard-part) [21] Require: A = 5 i=0 aiγi F p 12 ith a i F p 2, 0 i 5 Ensure: A 2 A 0 3a (1 + i)a 2 3 2a 0, A 1 6(1 + i)a 2a 5 + 2a 1 A 2 3a (1 + i)a 2 4 2a 2, A 3 6a 0a 3 + 2a 3 A 4 3a (1 + i)a 2 5 2a 4, A 5 6a 1a 4 + 2a 5 return 5 i=0 Aiγi A.2 RNS Parameter Selection Since p should be around 254 bits, e set n to be 8 and the moduli are chosen near We consider for i, j [1,n] the folloing issues: (1) bitlength of B i cj and C i bj ; (2) Hamming eight of b i and c j. A simple (bounded) exhaustive search program returns the bases shon in Section 5. We denote the bitlength of all B i,j in a length matrix, L B, and the bitlength matrix of all Ci,j as L C. For the selected bases, e have the folloing L B and L C L B = , L C = After truncating the least significant zeros, e get the folloing bitlength, denoted as L B and L C. Note that e applied the same truncation parameter for all the numbers in the same ro. The number of zeros truncated is chosen as min{z(b 1,j ), z(b 2,j ),, z(b n,j )} here z(b i,j ) gives the number of zeros at the LSBs of B i,j L B = , L C = No all of B i,j and C i,j are less than 25 bits, and they fit in one operand of an FPGA DSP slice, hile the standard 34-bit operands do not. In fact, in the implementation e do not truncate more zeros as far as all elements in that ro fit in 25 bits.

A FPGA pairing implementation using the Residue Number System. Sylvain Duquesne, Nicolas Guillermin

A FPGA pairing implementation using the Residue Number System. Sylvain Duquesne, Nicolas Guillermin A FPGA pairing implementation using the Residue Number System Sylvain Duquesne, Nicolas Guillermin IRMAR, UMR CNRS 6625 Université Rennes 1 Campus de Beaulieu 35042 Rennes cedex, France sylvain.duquesne@univ-rennes1.fr,

More information

Tripartite Modular Multiplication

Tripartite Modular Multiplication Tripartite Modular Multiplication Kazuo Sakiyama 1,2, Miroslav Knežević 1, Junfeng Fan 1, Bart Preneel 1, and Ingrid Verbauhede 1 1 Katholieke Universiteit Leuven Department of Electrical Engineering ESAT/SCD-COSIC

More information

Représentation RNS des nombres et calcul de couplages

Représentation RNS des nombres et calcul de couplages Représentation RNS des nombres et calcul de couplages Sylvain Duquesne Université Rennes 1 Séminaire CCIS Grenoble, 7 Février 2013 Sylvain Duquesne (Rennes 1) RNS et couplages Grenoble, 07/02/13 1 / 29

More information

Speeding Up Bipartite Modular Multiplication

Speeding Up Bipartite Modular Multiplication Speeding Up Bipartite Modular Multiplication Miroslav Knežević, Frederik Vercauteren, and Ingrid Verbauhede Katholieke Universiteit Leuven Department of Electrical Engineering - ESAT/SCD-COSIC and IBBT

More information

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields Santosh Ghosh, Dipanwita Roychowdhury, and Abhijit Das Computer Science and Engineering

More information

A High Speed Pairing Coprocessor Using RNS and Lazy Reduction

A High Speed Pairing Coprocessor Using RNS and Lazy Reduction A High Speed Pairing Coprocessor Using RNS and Lazy Reduction Gavin Xiaoxu Yao 1, Junfeng Fan 2, Ray C.C. Cheung 1, and Ingrid Verbauwhede 2 1 Department of Electronic Engineering City University of Hong

More information

Arithmetic operators for pairing-based cryptography

Arithmetic operators for pairing-based cryptography 7. Kryptotag November 9 th, 2007 Arithmetic operators for pairing-based cryptography Jérémie Detrey Cosec, B-IT, Bonn, Germany jdetrey@bit.uni-bonn.de Joint work with: Jean-Luc Beuchat Nicolas Brisebarre

More information

Fast hashing to G2 on pairing friendly curves

Fast hashing to G2 on pairing friendly curves Fast hashing to G2 on pairing friendly curves Michael Scott, Naomi Benger, Manuel Charlemagne, Luis J. Dominguez Perez, and Ezekiel J. Kachisa School of Computing Dublin City University Ballymun, Dublin

More information

Optimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves

Optimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves CT-RSA 2012 February 29th, 2012 Optimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves Joint work with: Nicolas Estibals CARAMEL project-team, LORIA, Université de Lorraine / CNRS / INRIA,

More information

Optimised versions of the Ate and Twisted Ate Pairings

Optimised versions of the Ate and Twisted Ate Pairings Optimised versions of the Ate and Twisted Ate Pairings Seiichi Matsuda 1, Naoki Kanayama 1, Florian Hess 2, and Eiji Okamoto 1 1 University of Tsukuba, Japan 2 Technische Universität Berlin, Germany Abstract.

More information

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields Santosh Ghosh, Dipanwita Roy Chowdhury, and Abhijit Das Computer Science and Engineering

More information

Modular Multiplication in GF (p k ) using Lagrange Representation

Modular Multiplication in GF (p k ) using Lagrange Representation Modular Multiplication in GF (p k ) using Lagrange Representation Jean-Claude Bajard, Laurent Imbert, and Christophe Nègre Laboratoire d Informatique, de Robotique et de Microélectronique de Montpellier

More information

Speeding up Ate Pairing Computation in Affine Coordinates

Speeding up Ate Pairing Computation in Affine Coordinates Speeding up Ate Pairing Computation in Affine Coordinates Duc-Phong Le and Chik How Tan Temasek Laboratories, National University of Singapore, 5A Engineering Drive 1, #09-02, Singapore 11711. (tslld,tsltch)@nus.edu.sg

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography Jean-Luc Beuchat Laboratory of Cryptography and Information Security Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1

More information

Some Efficient Algorithms for the Final Exponentiation of η T Pairing

Some Efficient Algorithms for the Final Exponentiation of η T Pairing Some Efficient Algorithms for the Final Exponentiation of η T Pairing Masaaki Shirase 1, Tsuyoshi Takagi 1, and Eiji Okamoto 2 1 Future University-Hakodate, Japan 2 University of Tsukuba, Japan Abstract.

More information

Faster Pairing Coprocessor Architecture

Faster Pairing Coprocessor Architecture Faster Pairing Coprocessor Architecture GavinXiaoxuYao 1, Junfeng Fan 2, Ray C.C. Cheung 1, and Ingrid Verbauwhede 2 1 Department of Electronic Engineering City University of Hong Kong, Hong Kong SAR Gavin.Yao@student.cityu.edu.hk,

More information

Pairings for Cryptography

Pairings for Cryptography Pairings for Cryptography Michael Naehrig Technische Universiteit Eindhoven Ñ ÐÖÝÔØÓ ºÓÖ Nijmegen, 11 December 2009 Pairings A pairing is a bilinear, non-degenerate map e : G 1 G 2 G 3, where (G 1, +),

More information

Pairings at High Security Levels

Pairings at High Security Levels Pairings at High Security Levels Michael Naehrig Eindhoven University of Technology michael@cryptojedi.org DoE CRYPTODOC Darmstadt, 21 November 2011 Pairings are efficient!... even at high security levels.

More information

Elliptic Curve Cryptography and Security of Embedded Devices

Elliptic Curve Cryptography and Security of Embedded Devices Elliptic Curve Cryptography and Security of Embedded Devices Ph.D. Defense Vincent Verneuil Institut de Mathématiques de Bordeaux Inside Secure June 13th, 2012 V. Verneuil - Elliptic Curve Cryptography

More information

Efficient Tate Pairing Computation Using Double-Base Chains

Efficient Tate Pairing Computation Using Double-Base Chains Efficient Tate Pairing Computation Using Double-Base Chains Chang an Zhao, Fangguo Zhang and Jiwu Huang 1 Department of Electronics and Communication Engineering, Sun Yat-Sen University, Guangzhou 510275,

More information

Faster Explicit Formulas for Computing Pairings over Ordinary Curves

Faster Explicit Formulas for Computing Pairings over Ordinary Curves Faster Explicit Formulas for Computing Pairings over Ordinary Curves Diego F. Aranha 2, Koray Karabina 1, Patrick Longa 1, Catherine H. Gebotys 1, Julio López 2 1 University of Waterloo, {kkarabin,plonga,cgebotys}@uwaterloo.ca

More information

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC Single Base Modular Multiplication for Efficient Hardare RNS Implementations of ECC Karim Bigou and Arnaud Tisserand CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and University Rennes 1, 6 rue

More information

Faster F p -arithmetic for Cryptographic Pairings on Barreto-Naehrig Curves

Faster F p -arithmetic for Cryptographic Pairings on Barreto-Naehrig Curves Faster F p -arithmetic for Cryptographic Pairings on Barreto-Naehrig Curves Junfeng Fan, Frederik Vercauteren and Ingrid Verbauwhede Katholieke Universiteit Leuven, COSIC May 18, 2009 1 Outline What is

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

Constructing Abelian Varieties for Pairing-Based Cryptography

Constructing Abelian Varieties for Pairing-Based Cryptography for Pairing-Based CWI and Universiteit Leiden, Netherlands Workshop on Pairings in Arithmetic Geometry and 4 May 2009 s MNT MNT Type s What is pairing-based cryptography? Pairing-based cryptography refers

More information

Analysis of Optimum Pairing Products at High Security Levels

Analysis of Optimum Pairing Products at High Security Levels Analysis of Optimum Pairing Products at High Security Levels Xusheng Zhang and Dongdai Lin Institute of Software, Chinese Academy of Sciences Institute of Information Engineering, Chinese Academy of Sciences

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,

More information

A Remark on Implementing the Weil Pairing

A Remark on Implementing the Weil Pairing A Remark on Implementing the Weil Pairing Cheol Min Park 1, Myung Hwan Kim 1 and Moti Yung 2 1 ISaC and Department of Mathematical Sciences, Seoul National University, Korea {mpcm,mhkim}@math.snu.ac.kr

More information

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and

More information

Hardware Acceleration of the Tate Pairing in Characteristic Three

Hardware Acceleration of the Tate Pairing in Characteristic Three Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)

More information

Generating more Kawazoe-Takahashi Genus 2 Pairing-friendly Hyperelliptic Curves

Generating more Kawazoe-Takahashi Genus 2 Pairing-friendly Hyperelliptic Curves Generating more Kawazoe-Takahashi Genus 2 Pairing-friendly Hyperelliptic Curves Ezekiel J Kachisa School of Computing Dublin City University Ireland ekachisa@computing.dcu.ie Abstract. Constructing pairing-friendly

More information

Katherine Stange. ECC 2007, Dublin, Ireland

Katherine Stange. ECC 2007, Dublin, Ireland in in Department of Brown University http://www.math.brown.edu/~stange/ in ECC Computation of ECC 2007, Dublin, Ireland Outline in in ECC Computation of in ECC Computation of in Definition A integer sequence

More information

Implementing Pairing-Based Cryptosystems

Implementing Pairing-Based Cryptosystems Implementing Pairing-Based Cryptosystems Zhaohui Cheng and Manos Nistazakis School of Computing Science, Middlesex University White Hart Lane, London N17 8HR, UK. {m.z.cheng, e.nistazakis}@mdx.ac.uk Abstract:

More information

Compact Ring LWE Cryptoprocessor

Compact Ring LWE Cryptoprocessor 1 Compact Ring LWE Cryptoprocessor CHES 2014 Sujoy Sinha Roy 1, Frederik Vercauteren 1, Nele Mentens 1, Donald Donglong Chen 2 and Ingrid Verbauwhede 1 1 ESAT/COSIC and iminds, KU Leuven 2 Electronic Engineering,

More information

An Algorithm for the η T Pairing Calculation in Characteristic Three and its Hardware Implementation

An Algorithm for the η T Pairing Calculation in Characteristic Three and its Hardware Implementation An Algorithm for the η T Pairing Calculation in Characteristic Three and its Hardware Implementation Jean-Luc Beuchat 1 Masaaki Shirase 2 Tsuyoshi Takagi 2 Eiji Okamoto 1 1 Graduate School of Systems and

More information

Efficient Computation of Tate Pairing in Projective Coordinate Over General Characteristic Fields

Efficient Computation of Tate Pairing in Projective Coordinate Over General Characteristic Fields Efficient Computation of Tate Pairing in Projective Coordinate Over General Characteristic Fields Sanjit Chatterjee, Palash Sarkar and Rana Barua Cryptology Research Group Applied Statistics Unit Indian

More information

Improving Modular Inversion in RNS using the Plus-Minus Method

Improving Modular Inversion in RNS using the Plus-Minus Method Improving Modular Inversion in RNS using the Plus-Minus Method Karim igou 2,1 and Arnaud Tisserand 3,1 1 IRISA, 2 INRIA Centre Rennes - retagne Atlantique, 3 CNRS, University Rennes 1, 6 rue Kerampont,

More information

On the complexity of computing discrete logarithms in the field F

On the complexity of computing discrete logarithms in the field F On the complexity of computing discrete logarithms in the field F 3 6 509 Francisco Rodríguez-Henríquez CINVESTAV-IPN Joint work with: Gora Adj Alfred Menezes Thomaz Oliveira CINVESTAV-IPN University of

More information

New Composite Operations and Precomputation Scheme for Elliptic Curve Cryptosystems over Prime Fields

New Composite Operations and Precomputation Scheme for Elliptic Curve Cryptosystems over Prime Fields New Composite Operations and Precomputation Scheme for Elliptic Curve Cryptosystems over Prime Fields Patrick Longa 1 and Ali Miri 2 1 Department of Electrical and Computer Engineering University of Waterloo,

More information

Non-generic attacks on elliptic curve DLPs

Non-generic attacks on elliptic curve DLPs Non-generic attacks on elliptic curve DLPs Benjamin Smith Team GRACE INRIA Saclay Île-de-France Laboratoire d Informatique de l École polytechnique (LIX) ECC Summer School Leuven, September 13 2013 Smith

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography Jean-Luc Beuchat 1, Nicolas Brisebarre 2,3, Jérémie Detrey 3, and Eiji Okamoto 1 1 Laboratory of Cryptography and Information Security, University of

More information

The Eta Pairing Revisited

The Eta Pairing Revisited 1 The Eta Pairing Revisited F. Hess, N.P. Smart and F. Vercauteren Abstract In this paper we simplify and extend the Eta pairing, originally discovered in the setting of supersingular curves by Baretto

More information

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields . Motivation and introduction An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields Marcin Rogawski Ekawat Homsirikamol Kris Gaj Cryptographic Engineering Research Group (CERG)

More information

Efficient Pairings Computation on Jacobi Quartic Elliptic Curves

Efficient Pairings Computation on Jacobi Quartic Elliptic Curves Efficient Pairings Computation on Jacobi Quartic Elliptic Curves Sylvain Duquesne 1, Nadia El Mrabet 2, and Emmanuel Fouotsa 3 1 IRMAR, UMR CNRS 6625, Université Rennes 1, Campus de Beaulieu 35042 Rennes

More information

Attractive Subfamilies of BLS Curves for Implementing High-Security Pairings

Attractive Subfamilies of BLS Curves for Implementing High-Security Pairings Attractive Subfamilies of BLS Curves for Implementing High-Security Pairings Craig Costello,2, Kristin Lauter 2, and Michael Naehrig 2,3 Information Security Institute Queensland University of Technology,

More information

A new algorithm for residue multiplication modulo

A new algorithm for residue multiplication modulo A new algorithm for residue multiplication modulo 2 521 1 Shoukat Ali and Murat Cenk Institute of Applied Mathematics Middle East Technical University, Ankara, Turkey shoukat.1983@gmail.com mcenk@metu.edu.tr

More information

Efficient Implementation of Cryptographic pairings. Mike Scott Dublin City University

Efficient Implementation of Cryptographic pairings. Mike Scott Dublin City University Efficient Implementation of Cryptographic pairings Mike Scott Dublin City University First Steps To do Pairing based Crypto we need two things l Efficient algorithms l Suitable elliptic curves We have

More information

Constructing Pairing-Friendly Elliptic Curves for Cryptography

Constructing Pairing-Friendly Elliptic Curves for Cryptography Constructing Pairing-Friendly Elliptic Curves for Cryptography University of California, Berkeley, USA 2nd KIAS-KMS Summer Workshop on Cryptography Seoul, Korea 30 June 2007 Outline 1 Pairings in Cryptography

More information

Hardware implementations of ECC

Hardware implementations of ECC Hardware implementations of ECC The University of Electro- Communications Introduction Public- key Cryptography (PKC) The most famous PKC is RSA and ECC Used for key agreement (Diffie- Hellman), digital

More information

Affine Precomputation with Sole Inversion in Elliptic Curve Cryptography

Affine Precomputation with Sole Inversion in Elliptic Curve Cryptography Affine Precomputation with Sole Inversion in Elliptic Curve Cryptography Erik Dahmen, 1 Katsuyuki Okeya, 2 and Daniel Schepers 1 1 Technische Universität Darmstadt, Fachbereich Informatik, Hochschulstr.10,

More information

Efficient hash maps to G 2 on BLS curves

Efficient hash maps to G 2 on BLS curves Efficient hash maps to G 2 on BLS curves Alessandro Budroni 1 and Federico Pintore 2 1 MIRACL Labs, London, England - budroni.alessandro@gmail.com 2 Department of Mathematics, University of Trento, Italy

More information

Efficient Implementation of Cryptographic pairings. Mike Scott Dublin City University

Efficient Implementation of Cryptographic pairings. Mike Scott Dublin City University Efficient Implementation of Cryptographic pairings Mike Scott Dublin City University First Steps To do Pairing based Crypto we need two things Efficient algorithms Suitable elliptic curves We have got

More information

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m )

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) 1 / 19 Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) Métairie Jérémy, Tisserand Arnaud and Casseau Emmanuel CAIRN - IRISA July 9 th, 2015 ISVLSI 2015 PAVOIS ANR 12 BS02

More information

Hybrid Binary-Ternary Joint Sparse Form and its Application in Elliptic Curve Cryptography

Hybrid Binary-Ternary Joint Sparse Form and its Application in Elliptic Curve Cryptography Hybrid Binary-Ternary Joint Sparse Form and its Application in Elliptic Curve Cryptography Jithra Adikari, Student Member, IEEE, Vassil Dimitrov, and Laurent Imbert Abstract Multi-exponentiation is a common

More information

Constructing Families of Pairing-Friendly Elliptic Curves

Constructing Families of Pairing-Friendly Elliptic Curves Constructing Families of Pairing-Friendly Elliptic Curves David Freeman Information Theory Research HP Laboratories Palo Alto HPL-2005-155 August 24, 2005* cryptography, pairings, elliptic curves, embedding

More information

FINDING COMPOSITE ORDER ORDINARY ELLIPTIC CURVES USING THE COCKS-PINCH METHOD

FINDING COMPOSITE ORDER ORDINARY ELLIPTIC CURVES USING THE COCKS-PINCH METHOD FINDING COMPOSITE ORDER ORDINARY ELLIPTIC CURVES USING THE COCKS-PINCH METHOD D. BONEH, K. RUBIN, AND A. SILVERBERG Abstract. We apply the Cocks-Pinch method to obtain pairing-friendly composite order

More information

Multi-core Implementation of the Tate Pairing over Supersingular Elliptic Curves

Multi-core Implementation of the Tate Pairing over Supersingular Elliptic Curves Multi-core Implementation of the Tate Pairing over Supersingular Elliptic Curves Jean-Luc Beuchat 1, Emmanuel López-Trejo, Luis Martínez-Ramos 3, Shigeo Mitsunari 4, and Francisco Rodríguez-Henríquez 3

More information

Generating more MNT elliptic curves

Generating more MNT elliptic curves Generating more MNT elliptic curves Michael Scott 1 and Paulo S. L. M. Barreto 2 1 School of Computer Applications Dublin City University Ballymun, Dublin 9, Ireland. mike@computing.dcu.ie 2 Universidade

More information

The Eta Pairing Revisited

The Eta Pairing Revisited The Eta Pairing Revisited F. Hess 1, N. Smart 2, and Frederik Vercauteren 3 1 Technische Universität Berlin, Fakultät II, Institut für Mathematik, MA 8-1, Strasse des 17. Juni 136, D-10623 Berlin, Germany.

More information

Formulas for cube roots in F 3 m

Formulas for cube roots in F 3 m Discrete Applied Mathematics 155 (2007) 260 270 www.elsevier.com/locate/dam Formulas for cube roots in F 3 m Omran Ahmadi a, Darrel Hankerson b, Alfred Menezes a a Department of Combinatorics and Optimization,

More information

Implementing Cryptographic Pairings over Barreto-Naehrig Curves

Implementing Cryptographic Pairings over Barreto-Naehrig Curves Implementing Cryptographic Pairings over Barreto-Naehrig Curves Augusto Jun Devegili 1, Michael Scott 2, and Ricardo Dahab 1 1 Instituto de Computação, Universidade Estadual de Campinas Caixa Postal 6176,

More information

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

A new class of irreducible pentanomials for polynomial based multipliers in binary fields Noname manuscript No. (will be inserted by the editor) A new class of irreducible pentanomials for polynomial based multipliers in binary fields Gustavo Banegas Ricardo Custódio Daniel Panario the date

More information

Montgomery Algorithm for Modular Multiplication with Systolic Architecture

Montgomery Algorithm for Modular Multiplication with Systolic Architecture Montgomery Algorithm for Modular Multiplication with ystolic Architecture MRABET Amine LIAD Paris 8 ENIT-TUNI EL MANAR University A - MP - Gardanne PAE 016 1 Plan 1 Introduction for pairing Montgomery

More information

Reducing the Key Size of Rainbow using Non-Commutative Rings

Reducing the Key Size of Rainbow using Non-Commutative Rings Reducing the Key Size of Rainbow using Non-Commutative Rings Takanori Yasuda (Institute of systems, Information Technologies and Nanotechnologies (ISIT)), Kouichi Sakurai (ISIT, Kyushu university) and

More information

Optimal Use of Montgomery Multiplication on Smart Cards

Optimal Use of Montgomery Multiplication on Smart Cards Optimal Use of Montgomery Multiplication on Smart Cards Arnaud Boscher and Robert Naciri Oberthur Card Systems SA, 71-73, rue des Hautes Pâtures, 92726 Nanterre Cedex, France {a.boscher, r.naciri}@oberthurcs.com

More information

A note on López-Dahab coordinates

A note on López-Dahab coordinates A note on López-Dahab coordinates Tanja Lange Faculty of Mathematics, Matematiktorvet - Building 303, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark tanja@hyperelliptic.org Abstract López-Dahab

More information

A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p

A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p Nicolas Guillermin 1,2 1 DGA Information Superiority, Bruz, France 2 IRMAR, Université Rennes 1, France Abstract. We present

More information

PUBLIC-KEY cryptography (PKC), a concept introduced

PUBLIC-KEY cryptography (PKC), a concept introduced 1 Speeding Up Barrett and Montgomery Modular Multiplications Miroslav Knežević, Student Member, IEEE, Frederik Vercauteren, and Ingrid Verbauwhede, Senior Member, IEEE Abstract This paper proposes two

More information

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Selçuk Baktır, Berk Sunar {selcuk,sunar}@wpi.edu Department of Electrical & Computer Engineering Worcester Polytechnic Institute

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY Identity-Based Cryptography T-79.5502 Advanced Course in Cryptology Billy Brumley billy.brumley at hut.fi Helsinki University of Technology Identity-Based Cryptography 1/24 Outline Classical ID-Based Crypto;

More information

arxiv: v3 [cs.cr] 5 Aug 2014

arxiv: v3 [cs.cr] 5 Aug 2014 Further Refinements of Miller Algorithm on Edwards curves Duc-Phong Le, Chik How Tan Temasek Laboratories, National University of Singapore 5A Engineering Drive 1, #09-02, Singapore 117411. arxiv:1305.2694v3

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

Unbalancing Pairing-Based Key Exchange Protocols

Unbalancing Pairing-Based Key Exchange Protocols Unbalancing Pairing-Based Key Exchange Protocols Michael Scott Certivox Labs mike.scott@certivox.com Abstract. In many pairing-based protocols more than one party is involved, and some or all of them may

More information

Faster Squaring in the Cyclotomic Subgroup of Sixth Degree Extensions

Faster Squaring in the Cyclotomic Subgroup of Sixth Degree Extensions Faster Squaring in the Cyclotomic Subgroup of Sixth Degree Extensions Robert Granger and Michael Scott School of Computing, Dublin City University, Glasnevin, Dublin 9, Ireland. {rgranger,mike}@computing.dcu.ie

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

An Analysis of Affine Coordinates for Pairing Computation

An Analysis of Affine Coordinates for Pairing Computation An Analysis of Affine Coordinates for Pairing Computation Kristin Lauter, Peter L. Montgomery, and Michael Naehrig Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA {klauter, petmon, mnaehrig}@microsoft.com

More information

Pairings for Cryptographers

Pairings for Cryptographers Pairings for Cryptographers Steven D. Galbraith 1, Kenneth G. Paterson 1, and Nigel P. Smart 2 1 Information Security Group, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, United Kingdom.

More information

Fast arithmetic and pairing evaluation on genus 2 curves

Fast arithmetic and pairing evaluation on genus 2 curves Fast arithmetic and pairing evaluation on genus 2 curves David Freeman University of California, Berkeley dfreeman@math.berkeley.edu November 6, 2005 Abstract We present two algorithms for fast arithmetic

More information

On Near Prime-Order Elliptic Curves with Small Embedding Degrees

On Near Prime-Order Elliptic Curves with Small Embedding Degrees On Near Prime-Order Elliptic Curves with Small Embedding Degrees Duc-Phong Le, Nadia El Mrabet, Tan Chik How To cite this version: Duc-Phong Le, Nadia El Mrabet, Tan Chik How. On Near Prime-Order Elliptic

More information

Binary-Ternary Plus-Minus Modular Inversion in RNS

Binary-Ternary Plus-Minus Modular Inversion in RNS inary-ternary Plus-Minus Modular Inversion in RNS Karim igou, Arnaud Tisserand To cite this version: Karim igou, Arnaud Tisserand inary-ternary Plus-Minus Modular Inversion in RNS IEEE Transactions on

More information

On near prime-order elliptic curves with small embedding degrees (Full version)

On near prime-order elliptic curves with small embedding degrees (Full version) On near prime-order elliptic curves with small embedding degrees (Full version) Duc-Phong Le 1, Nadia El Mrabet 2, and Chik How Tan 1 1 Temasek Laboratories, National University of Singapore {tslld,tsltch}@nus.edu.sg

More information

Introduction to Elliptic Curve Cryptography. Anupam Datta

Introduction to Elliptic Curve Cryptography. Anupam Datta Introduction to Elliptic Curve Cryptography Anupam Datta 18-733 Elliptic Curve Cryptography Public Key Cryptosystem Duality between Elliptic Curve Cryptography and Discrete Log Based Cryptography Groups

More information

On the Optimal Pre-Computation of Window τ NAF for Koblitz Curves

On the Optimal Pre-Computation of Window τ NAF for Koblitz Curves On the Optimal Pre-Computation of Window τ NAF for Koblitz Curves William R. Trost and Guangwu Xu Abstract Koblitz curves have been a nice subject of consideration for both theoretical and practical interests.

More information

Improving Modular Inversion in RNS using the Plus-Minus Method

Improving Modular Inversion in RNS using the Plus-Minus Method Author manuscript, published in "CHES - 15th Workshop on Cryptographic Hardware and Embedded Systems - 2013 8086 (2013) 233-249" DOI : 10.1007/978-3-642-40349-1_14 Improving Modular Inversion in RNS using

More information

Polynomial Interpolation in the Elliptic Curve Cryptosystem

Polynomial Interpolation in the Elliptic Curve Cryptosystem Journal of Mathematics and Statistics 7 (4): 326-331, 2011 ISSN 1549-3644 2011 Science Publications Polynomial Interpolation in the Elliptic Curve Cryptosystem Liew Khang Jie and Hailiza Kamarulhaili School

More information

Hardware Acceleration of the Tate Pairing in Characteristic Three

Hardware Acceleration of the Tate Pairing in Characteristic Three Hardware Acceleration of the Tate Pairing in Characteristic Three P. Grabher and D. Page Institute for Applied Information Processing and Communications, Graz University of Technology, Inffeldgasse 16a,

More information

Faster Pairings on Special Weierstrass Curves

Faster Pairings on Special Weierstrass Curves craig.costello@qut.edu.au Queensland University of Technology Pairing 2009 Joint work with Huseyin Hisil, Colin Boyd, Juanma Gonzalez-Nieto, Kenneth Koon-Ho Wong Table of contents 1 Introduction The evolution

More information

New attacks on RSA with Moduli N = p r q

New attacks on RSA with Moduli N = p r q New attacks on RSA with Moduli N = p r q Abderrahmane Nitaj 1 and Tajjeeddine Rachidi 2 1 Laboratoire de Mathématiques Nicolas Oresme Université de Caen Basse Normandie, France abderrahmane.nitaj@unicaen.fr

More information

A New Family of Pairing-Friendly elliptic curves

A New Family of Pairing-Friendly elliptic curves A New Family of Pairing-Friendly elliptic curves Michael Scott 1 and Aurore Guillevic 2 1 MIRACL.com 2 Université de Lorraine, CNRS, Inria, LORIA, Nancy, France May 21, 2018 Abstract There have been recent

More information

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs Sujoy Sinha Roy, Chester Rebeiro and Debdeep Mukhopadhyay Department of Computer Science and Engineering

More information

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m )

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m ) A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m ) Johann Großschädl Graz University of Technology Institute for Applied Information Processing and Communications Inffeldgasse

More information

Secure Bilinear Diffie-Hellman Bits

Secure Bilinear Diffie-Hellman Bits Secure Bilinear Diffie-Hellman Bits Steven D. Galbraith 1, Herbie J. Hopkins 1, and Igor E. Shparlinski 2 1 Mathematics Department, Royal Holloway University of London Egham, Surrey, TW20 0EX, UK Steven.Galbraith@rhul.ac.uk,

More information

Speeding Up the Fixed-Base Comb Method for Faster Scalar Multiplication on Koblitz Curves

Speeding Up the Fixed-Base Comb Method for Faster Scalar Multiplication on Koblitz Curves Speeding Up the Fixed-Base Comb Method for Faster Scalar Multiplication on Koblitz Curves Christian Hanser and Christian Wagner Institute for Applied Information Processing and Communications (IAIK), Graz

More information

2.2. The Weil Pairing on Elliptic Curves If A and B are r-torsion points on some elliptic curve E(F q d ), let us denote the r-weil pairing of A and B

2.2. The Weil Pairing on Elliptic Curves If A and B are r-torsion points on some elliptic curve E(F q d ), let us denote the r-weil pairing of A and B Weil Pairing vs. Tate Pairing in IBE systems Ezra Brown, Eric Errthum, David Fu October 10, 2003 1. Introduction Although Boneh and Franklin use the Weil pairing on elliptic curves to create Identity-

More information

Implementing Cryptographic Pairings on Smartcards

Implementing Cryptographic Pairings on Smartcards Implementing Cryptographic Pairings on Smartcards Michael Scott, Neil Costigan, and Wesam Abdulwahab School of Computer Applications Dublin City University Ballymun, Dublin 9, Ireland. mike@computing.dcu.ie

More information

Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand

Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand Efficient Modular Exponentiation Based on Multiple Multiplications by a Common Operand Christophe Negre, Thomas Plantard, Jean-Marc Robert Team DALI (UPVD) and LIRMM (UM2, CNRS), France CCISR, SCIT, (University

More information

Optimal Pairings. F. Vercauteren

Optimal Pairings. F. Vercauteren Optimal Pairings F. Vercauteren Department of Electrical Engineering, Katholieke Universiteit Leuven Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium frederik.vercauteren@esat.kuleuven.be Abstract.

More information

Efficient multiplication in characteristic three fields

Efficient multiplication in characteristic three fields Efficient multiplication in characteristic three fields Murat Cenk and M. Anwar Hasan Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 mcenk@uwaterloo.ca

More information

International Journal of Advanced Computer Technology (IJACT)

International Journal of Advanced Computer Technology (IJACT) AN EFFICIENT DESIGN OF LOW POWER,FAST EL- LIPTIC CURVE SCALAR MULTIPLIER IN ECC USING S Jayalakshmi K R, M.Tech student, Mangalam college of engineering,kottayam,india; Ms.Hima Sara Jacob, Assistant professor,

More information

Elliptic Curve Cryptography

Elliptic Curve Cryptography The State of the Art of Elliptic Curve Cryptography Ernst Kani Department of Mathematics and Statistics Queen s University Kingston, Ontario Elliptic Curve Cryptography 1 Outline 1. ECC: Advantages and

More information