Efficient Fixed Base Exponentiation and Scalar Multiplication based on a Multiplicative Splitting Exponent Recoding

Effcent Fxed Base Exponentaton and Scalar Multplcaton based on a Multplcatve Splttng Exponent Recodng Jean-Marc Robert, Chrstophe Negre, Thomas Plantard To cte ths verson: Jean-Marc Robert, Chrstophe Negre, Thomas Plantard. Effcent Fxed Base Exponentaton and Scalar Multplcaton based on a Multplcatve Splttng Exponent Recodng. Journal of Cryptographc Engneerng, Sprnger, In press, <10.1007/s13389-018-0196-7>. <lrmm-01926767> HAL Id: lrmm-01926767 https://hal-lrmm.ccsd.cnrs.fr/lrmm-01926767 Submtted on 19 Nov 2018 HAL s a mult-dscplnary open access archve for the depost and dssemnaton of scentfc research documents, whether they are publshed or not. The documents may come from teachng and research nsttutons n France or abroad, or from publc or prvate research centers. L archve ouverte plurdscplnare HAL, est destnée au dépôt et à la dffuson de documents scentfques de nveau recherche, publés ou non, émanant des établssements d ensegnement et de recherche franças ou étrangers, des laboratores publcs ou prvés.

Effcent Fxed Base Exponentaton and Scalar Multplcaton based on a Multplcatve Splttng Exponent Recodng Jean-Marc Robert 2,3, Chrstophe Negre 2,3 and Thomas Plantard 1 1 CCISR, SCIT, Unversty of Wollongong, Australa 2 Team DALI, Unversté de Perpgnan Va Domta, France 3 LIRMM, UMR 5506, Unversté de Montpeller and CNRS, France 1 Abstract Dgtal Sgnature Algorthm (DSA) (resp. ECDSA) nvolves modular exponentaton (resp. scalar multplcaton) of a publc and known base by a random one-tme exponent. In order to speed-up ths operaton, well-known methods take advantage of the memorzaton of base powers (resp. base multples). Best approaches are the Fxed-base Radx- R method and the Fxed-base Comb method. In ths paper we present a new approach for storage/onlne computaton trade-off, by usng a multplcatve splttng of the dgts of the exponent radx-r representaton. We adapt classcal algorthms for modular exponentaton and scalar multplcaton n order to take advantage of the proposed exponent recodng. An analyss of the complexty for practcal sze shows that our proposed approach nvolves a lower storage for a gven level of onlne computaton. Ths s confrmed by mplementaton results showng sgnfcant memory savng, up to 3 tmes for the largest NIST standardzed key szes, compared to the state of the art approaches. Keywords. RNS, Multplcatve Splttng, Dgtal Sgnature, Fxed Base, Modular Exponentaton, Scalar Multplcaton, Memory Storage, Effcent Software Implementaton. I. INTRODUCTION In the DSS (Dgtal Sgnature Standard), DSA (Dgtal Sgnature Algorthm) s a popular authentcaton protocol. Accordng to the NIST standard (see [12]), the publc parameters are p, q and g. The parameter g s a generator of a multplcatve sub-group of F p of sze q. The ntegers p and q are two prmes wth szes correspondng to the requred securty level: for the recommended securty level 80-256 bts, q has to be a 160-512 bt nteger. When a server needs to sgn a batch of documents, the most costly operatons are modular exponentatons g k mod p (one per sgnature), where g, p are fxed and k s a one tme random nteger. Another popular standard for electronc sgnature s ECDSA whch uses the group of pont on an ellptc curve (E(F p ), +) nstead of (F p, ). The sgnature algorthm ECDSA s very smlar to the DSA and ts man operaton

2 s a scalar multplcaton k P for P E(F p ). In order to cover both cases DSA and ECDSA we consder a multplcatve abelan group (G, ) n whch we have to compute g k for g G and k N. In ths artcle we consder the followng practcal case: a server has to compute a large number of sgnatures, whch nvolves a large number of exponentatons g k wth the same g G and several random k. We assume that the server has a large cache and RAM (Random Access Memory) so that we can therefore store a large amount of precomputed data to speed-up these exponentatons. In the sequel, by offlne computaton we mean the data computed only once and used n every sgnature generaton; by onlne computaton we mean the operatons requred only n a sngle exponentaton g k for a gven k. The man known methods of the state of the art whch take advantage of large amount of precomputed data are the Fxed-base Radx R presented by Gordon n [8] and the Fxed-base Comb presented by Lm and Lee n [14]. The Fxed-base Radx R method of [8] precomputes g ar for 0 a < R and then, usng the radx-r expresson of k, we obtan the exponentaton g k wth log R (k) multplcatons. The Fxed-base Comb method uses a Comb decomposton of k (nstead of a radx-r representaton) and requres less precomputed data at the cost of some extra squarngs. In [17] the authors provde a varant of the Radx-R approach usng the NAF w recodng resultng n a reduced number of onlne multplcatons than for the radx-r approach but wth a penalty of some extra squarngs. Contrbutons. We nvestgate some new strateges for a better trade-off between storage and onlne computaton n fxed base exponentaton. To reach ths goal, we propose to use the representaton of the exponent n radx R as k = l 1 =0 k R and then compute a multplcatve splttng of each dgt k. Specfcally, we use a radx R = m 0 m 1 wth parwse prme m 0, m 1. An RNS representaton of a dgt k [0, R[ n {m 0, m 1 } leads to a splttng nto two parts: one part k (0) whch value s at most m 0 and the other k (1) whch value s at most m 1. We apply ths process to all the dgts of the radx R representaton of the exponent. Whle processng the exponentaton, the dgts k (1) are handled wth a look-up table and the dgts k (0) are handled wth onlne computaton. Ths approach was part of a prelmnary verson of ths paper publshed n the proceedngs of the WAIFI 2016 conference [20]. We present a novel approach for the multplcatve splttng of the dgts of the exponent: f we choose the radx R as a prme nteger, then processng a partal executon of the extended eucldean algorthm, one can re-express a dgt k as product k = k (0) (k (1) ) 1 mod R where k (1) < c and k (0) < R/c for a fxed c. Agan, ths splttng can be appled to all dgts of the radx R representaton of the exponent. The exponentaton algorthms can then be computed wth memorzatons related to the (k (1) ) 1 part of the dgt splttng and onlne computaton to handle the part k (0) of the dgt splttng. The man advantage of ths verson wth a prme R s that the resultng exponentaton algorthm s constant tme, whch means that t s robust aganst tmng attacks. We study the correspondng complextes and storage amounts, and compare the results wth the best approaches of the lterature for fxed-base modular exponentaton (resp. scalar multplcaton) for NIST recommended felds (resp. curves). The metrc chosen for a comparson between the proposed algorthms s the followng: for a gven level of onlne computaton the best approach s the one whch has the lowest amount of precomputed data. Usng

3 ths metrc we show that the proposed approach s the more effcent for a large range of practcal case. We also mplement these approaches n software and we perform tests n order to valdate the complexty analyss. Our approaches provde also some flexblty n terms of requred storage amount: one can choose the storage amount accordng to the devce resources avalable and compatble to the global computaton load of the system. Organzaton of the paper. In Secton II, we revew the best approaches of the lterature for fxed-base exponentaton and we gve ther complextes and storage requrements. In Secton III, we present a multplcatve splttng recodng of the exponent n radx R = m 0 m 1 and a fxed-base exponentaton usng ths recodng. In Secton IV, we present a multplcatve splttng recodng for R prme and the correspondng exponentaton algorthm. In Secton V, we compare the complexty results and software mplementatons of the proposed approach to the best approaches of the lterature for modular exponentaton and scalar multplcaton. Fnally, n Secton VI, we gve some concludng remarks and perspectves. II. STATE OF THE ART OF FIXED-BASE EXPONENTIATION We consder dgtal sgnature algorthms based on dscrete logarthm n a fnte group. The man ones are DSA where the consdered group s a subgroup of prme order q n the multplcatve group F p and ECDSA where the group s the set of pont on an ellptc curve E(F p ) [16], [13]. For the sake of smplcty, n the sequel, we use a generc abelan multplcatve group (G, ) of order q. The algorthms presented later n ths paper extend drectly to abelan groups wth addtve group law lke E(F p ). Generatng a dgtal sgnature conssts n computng (s 1, s 2 ) from a message m {0, 1}, a secret nteger x and a random nteger k as follows s 1 H 1 (g k ), s 2 (H 2 (m) + s 1 x)k 1 mod q. Here, H 1 s a functon G Z/qZ and H 2 s a cryptographc hash functon {0, 1} Z/qZ. One can see that the most costly operaton n a sgnature generaton s the exponentaton g k of a fxed g G and where k s a one-tme random exponent of sze = q. Ths exponentaton can be done wth the classcal Square-and-multply algorthm. Square-and-multply exponentaton. The left-to-rght verson of the square-and-multply exponentaton scans the bts k of k from left to rght and performs a squarng followed by a multplcaton when k = 1. In terms of complexty, gven the bt length t of k, the number of squarngs s t 1 and the number of multplcatons to be computed s t/2 on average for a randomly chosen exponent. There s no storage n ths case. Sde channel analyss. The above method s threatened by sde-channel analyss. These attacks extract part of the exponent by montorng and analyzng the computaton tme, the power consumpton or the electromagnetc emanatons. In ths paper, we focus on servers whch generate large amounts of sgnature very quckly and are physcally not accessble to an attacker. The man threat n ths case s the tmng attack. Ths attack attempts to fnd the sequence of operatons (multplcaton and squarng) of an exponentaton by a statstcal analyss of several tmngs of an exponentaton. If the assumed sequence of operatons s correct, the attacker can deduce the key bts of the exponent snce each multplcaton corresponds to a bt equal to 1, otherwse the bt s 0. A general soluton

4 Algorthm 1 Left-to-Rght Square-and-multply Exponentaton Requre: Let an nteger k = (k t 1,..., k 0 ) 2, and g an element of G. Ensure: X = g k 1: X 1 2: for from t 1 downto 0 do 3: X X 2 4: f k = 1 then 5: X X g 6: return (X) to thwart ths attack s to render the sequence of operatons not correlated to key bts, whch means that we need to remove any f test on the key bts or dgts n the exponentaton algorthm. Fxed base exponentaton. When the base g s fxed, one can precompute n advance some data n order to reduce the number of operatons n the onlne computaton of the exponentaton. Ths s the case when a server has to ntensvely compute a number of sgnatures wth the same g. For example, the method presented by Gordon n [8] s a modfed square-and-multply algorthm: one frst stores the t successve squarngs of g (that s the sequence of g 2 ), then for a gven computaton of g k, one has to multply the g 2 correspondng to k = 1. In terms of complexty, gven the bt length t of the exponent, one has now no squarngs and the number of multplcatons s t/2, n average. As counterpart, one has to store t elements of G. We can even further reduce the amount of onlne computaton by ncreasng the precomputed data. Ths s the strategy followed by the man approaches of the lterature. Radx-R method. Gordon n [8] mentons the generalzaton of hs frst dea to radx R = 2 w representaton of the exponent k = l 1 =0 k R. Ths conssts n the memorzaton of the values g a Rj, wth a [0,..., R 1] and 0 j < l where l s the length of the exponent n radx R representaton. If we denote w = log 2 (R) then we have l = t/w. In ths case, the onlne computaton conssts of l 1 multplcatons, for a storage amount of l R values n G. In the sequel, we wll call ths approach the Fxed-base Radx-R exponentaton method (see Algorthm 2). Ths algorthm s constant tme as soon as the multplcatons by 1 (.e., when k = 0) are performed as any other multplcaton or, alternatvely, by usng the radx R recodng of [11] whch avods k = 0. Comb method. Another classcal method s the so called Fxed-base Comb method whch was ntally proposed by Lm and Lee n [14]. Ths method attempts to trade some of the storage of Algorthm 2 wth a few onlne computed squarngs. It s based on the followng decomposton of the exponent k d 1 w 1 k = ( k d+j 2 d ) 2 j where d = t/w. (1) j=0 =0 } {{ } K j Each nteger K j can be seen as a comb as descrbed n the followng dagram.

5 Algorthm 2 Fxed-Base Radx-R Exponentaton Requre: k = (k l 1,..., k 0 ) R, g a generator of G. Ensure: X = g k 1: Offlne precomputaton. Store T [a][j] g a Rj, wth a [0,..., R 1] and 0 j < l. 2: X 1 3: for from l 1 downto 0 do 4: X X T [k ][] 5: return (X) The nteger w s the number of comb-teeth n each K j and d = t/w s the dstance n bts between two consecutve teeth. When all the possble values g Kj are precomputed and stored n table ndexed by I Kj = [k (w 1)d+j k (w 2)d+j..., k j ] 2, one can compute g k wth a 2 w sze look-up table, t/w 1 multplcatons and t/w 1 squarngs usng (1). Ths method s shown n Algorthm 3. As n the case of Radx-R method, ths approach can be mplemented n constant tme f the multplcatons by 1 (whch occurs K j = 0) are computed as an arbtrary multplcaton or by usng the recodng of [10] whch renders all comb coeffcents 0. Algorthm 3 Fxed-Base Comb Exponentaton [14] Requre: k = (k t 1,..., k 1, k 0 ) 2, a generator g of G, a wndow wdth 2 w and d = t/w. Ensure: X = g k mod p 1: Offlne precomputaton. For all (a w 1,..., a 0 ) {0, 1} w we set a = a w 1 2 (w 1)d + + a 1 2 d + a 0 and T [(a w 1,..., a 0 ) 2 ] = g a. 2: Splt k = d 1 j=0 K j2 j as n (1) 3: X 1 4: for j from d 1 downto 0 do 5: X X 2 6: X X T [K j ] 7: return (X) Fxed base exponentaton wth NAF w. In [17], the authors proposed an alternatve approach when nvertng an element n the group G s almost free of computaton and mult-squarngs can be computed effcently. Ther man applcaton s the group of ponts on a ellptc curves where computng the nverse of a pont s really cheap. They

6 use a NAF w representaton of k n order to reduce the number of multplcatons (ths generalzes the approach of [21] whch uses a NAF representaton of k). Specfcally, they start by computng the NAF w representaton of the exponent k k = k t 12 t 1 + k t 22 t 2 + + k 0 where k {±1, ±3,..., ±2w 1 1} and there are at least w zero between two non zero coeffcents. For more detals on NAF w the reader may refer to [9]. Then they rewrte ths NAF w (k) nto l = t/w consecutve wndows of w coeffcents: k = l w 1 k w+j2 j j=0 2w. (2) }{{} K =0 In [17] the authors notced that, n each K, there s at most one non-zero coeffcent k w+j, whch means that K = s a 2 j for some s { 1, 1}, a {1, 3,..., 2 w 1 1} and 0 j < w. They then reorder the terms n expresson (2) by splttng the parameter nto two parts = 1 e + 0 for some fxed nteger e: k = e 1 = e 1 0=0 d 1 0=0 K 1=0 1e+ 0 2 1ew+0w where d = l/e ( d 1 ) K 1=0 1e+ 0 2 1ew 2 0w. (3) For all possble values for K 1e+ 0 2 1ew wth K 1e+ 0 = sa2 j the term g a2j+ 1 ew s stored n a Table T [a][ 1 ][j]. Then Algorthm 4 computes g k based on (3) as a sequence of multplcatons/dvsons (n Step 9 dependng on s = 1 or s 1) and w consecutve squarngs (n Step 5). Algorthm 4 Fxed-Base Exponentaton wth NAF w [17] Requre: A scalar k = (k t 1,..., k 1, k 0) NAFw and g n an abelan group G, and postve ntegers c, w. Ensure: X = g k 1: l = t w and d = l e 2: Offlne precomputaton. T [a][ 1 ][j] = g a2j+ew 1 j {0,..., w 1}. 3: X 1 4: for 0 from e 1 downto 0 do 5: X X 2w 6: for 1 from d 1 downto 0 do 7: (s, a, j) s.t. (k jb+t,w 1... k jb+t,0 ) NAF w = s a 2 j 8: f a 0 then 9: X X (T [a][ 1 ][j]) s 10: return (X) for all a {1, 3,..., 2 w 1 1}, 1 {0,..., d 1} and

7 In Algorthm 4 the number of precomputed elements s equal to dw2 w 2 = t e 2w 2. The onlne computaton conssts of w(e 1) squarngs and ed(1 ( w w+1 )w ) = t w (1 ( w w+1 )w ) multplcatons/dvsons (cf. [17] for detals). III. FIXED-BASE EXPONENTIATION WITH MULTIPLICATIVE SPLITTING WITH R = m 0 m 1 We now present our approach of a Fxed-base exponentaton wth multplcatve splttng wth R = m 0 m 1. In ths secton, we revew the method presented n a prelmnary work at WAIFI 2016 [20]. The goal s to use a multplcatve splttng of the dgts of k n order to provde a better trade-off between storage and onlne computaton n the exponentaton. A. Dgt multplcatve splttng for radx R = m 0 m 1 A natural way to get a splttng of the dgts s to use the RNS representaton n radx R = m 0 m 1 whch splts any dgt nto two parts. When all the dgts of an exponent are splt we can process the exponentaton as follows: the frst part of the dgts wll be used to select the precomputed values and the second part wll be processed by onlne computaton. We frst remnd the RNS representaton n a base B = {m 0, m 1 }. Let R = m 0 m 1 and x Z such that 0 x < R. Let us also assume m 0 s prme, snce ths allows us to nvert all non-zero ntegers < m 0 modulo m 0, and we choose m 1 < m 0. In the sequel, we denote x m = x mod m. One represents x wth the resdues x (0) = x m0, x (1) = x m1, and x can be retreved usng the Chnese Remander Theorem as follows: x = x (0) m 1 m 1 1 m 0 + x (1) m 0 m 1 0 R m 1. (4) We now present our recodng approach. We consder an exponent k expressed n radx R = m 0 m 1 l 1 k = k R wth l = t/ log 2 (R). =0 We represent every radx-r dgt n RNS wth the RNS base B = {m 0, m 1 }: f k s the -th dgt of k n radx-r, we denote by (k (0), k (1) ) ts RNS representaton n base B Let us denote We recode the dgts of k n B = {m 0, m 1 } as follows k (0) = k m0, k (1) = k m1. m 0 = m 1 m 1 1 m 0, m 1 = m 0 m 1 0 m 1.

8 If k (1) 0: we denote One keeps k (0) = k (0) (k (1) ) 1 m0, k (1) = k (1). k = k (1) k (0) m 0 + m 1 R (5) as a representaton of k n a multplcatve splttng form and we have k = k R wth (4). When modfyng the dgts of k as above, one needs to take nto account the correctng term due to the reducton modulo R: k = k (1) k (0) m 0 + m 1 R k (1) k (0) m 0 + m 1 R /R R. Let us denote C = k (1) (k (0) m 0 + m 1)/R whch satsfes 0 C < m 1. We consder C as a carry that one can subtract to k +1. Ths leads to the followng computaton f k +1 C else then k +1 k +1 C C 0 k +1 k +1 + R C, C 1 and one gets k +1 0. If k (1) = 0: we defne k as follows k = k (0) + 1 m0 m 0 + m 1 m 0 + m }{{ R 1 R. (6) }}{{} =1 ( ) and k satsfes k R = k Ths expresson s meant to have the part ( ) as n (5): the goal s to use the same precomputed data n the exponentaton algorthm. The term m 0 + m 1 R = 1 s meant to get back to k whle reducng k modulo R. We then set the followng coeffcents: k (0) = k (0) + 1 m0, k (1) = 0. Settng k (1) = 0 tells us that ths s a specal case and we get k from k (0) as k = k (0) m 0 + m 1) R 1. R We deal wth the carry as t was done when k (1) 0, ths s detaled n the algorthm. One notces t mght be necessary to handle the last carry C generated by the recodng of k l 1 wth a fnal correcton. Ths gves a fnal coeffcent k l = C whch satsfes k l < m 1. Fnally, ths leads to the recodng algorthm shown n Algorthm 5.

9 Algorthm 5 Multplcatve Splttng Recodng wth R = m 0 m 1 Requre: An RNS base {m 0, m 1}, a radx R = m 0 m 1 and an exponent k = l 1 =0 kr. Ensure: {(k (0), k (1) ), 0 < l, (C)} the multplcatve splttng recodng of k n radx R = m 0m 1. 1: C 0 2: for from 0 to l 1 do 3: k k C, C 0 4: f k < 0 then 5: k k + R, C 1 6: k (0) k m0, k (1) k m1. 7: f k (1) = 0 then 8: (k (0), k (1) ) ( k (0) + 1 m0, 0) ( ) 9: C C + k (0) m 0 + m 1 R 1 /R 10: else 11: k (0) k (0) (k (1) ) 1 m0 12: k (1) k (1) 13: C C + k (1) k (0) m 0 + m 1 R/R 14: return {(k (0), k (1) ), 0 < l, k l = C} At the end the recoded exponent k = l =0 k R has most of ts dgts k expressed as a product k (1) k (0) m 0 + m 1 R and k (1) s of sze m 1 whle k (0) m 0 + m 1 R s ndexed wth k (0) whch s of sze m 0. Example 1. We present here an example of the m 0 m 1 recodng wth an exponent sze t of 20 bts (0 < k < 2 20 ), and B = {11, 8} (.e. m 0 = 11, m 1 = 8). Thus, n ths case, one has the radx R = m 0 m 1 = 88, l = 20/ log 2 (88) = 4, and also m 0 = 8 8 1 11 = 56, m 1 = 11 11 1 8 = 33. Let us take k = 936192 10, the random exponent. By rewrtng k n radx-r, one has k = 48 + 78 88 + 32 88 2 + 1 88 3. We now use Algorthm 5, whch conssts of a for loop (Steps 2 to 13). In the frst teraton ( = 0), one has k 0 = 48. One has C 0 and one skps the f-test steps 4 to 5 snce k 0 0. Step 6, one computes the RNS representaton n base B of k 0 = 48: Steps 7 to 9, snce k (1) 0 = 0, one sets k (0) 0 = k 0 11 = 4, k (1) 0 = k 0 8 = 0. (k (0) 0, k (1) 0 ) ( k (0) 0 + 1 11, 0) = (5, 0).

10 and the carry ( ) C C + k (0) 0 56 + 33 88 1 /88 = 0 In the second teraton ( = 1), one has k 1 = 78. One has C 0 and one skps the f-test of Steps 4 to 5 snce k 1 0. Step 6, one computes the RNS representaton n base B of k 1 = 78: Steps 10 to 13, snce k (1) 1 0, one has k (0) 1 = k 1 11 = 1, k (1) 1 = k 1 8 = 6. (k (1) 1 ) 1 11 2 k (0) 1 = k (0) 1 (k (1) 1 ) 1 11 2 k (1) 1 = k (1) 1 6 C (k (1) 1 k (0) 1 56 + 33 88 )/88 3 In the thrd teraton ( = 2), one has now k 2 k 2 C = 29. The RNS representaton n base B of k 2 s k (0) 2 = 7, k (1) 2 = 5. The Steps 10-13 gve C 2, and (k (0) 2, k (1) 2 ) (8, 5). Wthout provdng all the remanng detals, one fnally obtans the values returned by the algorthm: ((5, 0), (2, 6), (8, 5), (3, 7)), and k 4 = C = 2. B. Exponentaton wth a multplcatve splttng recodng n radx R = m 0 m 1 We frst rewrte the exponentaton usng the recodng of k = l =0 k R of the prevous subsecton as follows: g k mod p = g l =0 k R k where each term g R satsfy one of the followng three cases: = g k l Rl l 1 =0 gk R (7) When k (1) 0 and < l: When k (1) = 0 and < l: k g R = g k (1) R (0) k m 0 +m 1 R g k R (0) R k = g m 0 +m 1 R g R. when = l we have k l 0 whch mples that gk l Rl = (g Rl ) k l. In order to compute the fxed-base exponentaton g k, one stores the followng values: T [][j] = g R j m 0 l 1, 0 +m 1 R, wth 0 j < m 0.

11 and one also stores the followng nverses: T [][ 1] = g R wth 0 l. (0) R k We use Y j to denote the product of g m 0 +m 1 R for each such that k (1) = j. In other words for j 0 ( ) for k Y j = (1) =j,<l T [][k (0) ] T [l][ 1] f k l ( = j, ) for k (1) =j,<l T [][k (0) ], and Y 0 = for all k (1) =0,<l T [][k (0) ] T [][ 1]. We can then rewrte the expresson of g k n (7) n terms of Y j for j = 0,..., m 1 1 as follows: Each ndvdual exponentaton Y j j m 1 1 g k = Y 0 Y j j. j=1 s performed wth a square-and-multply approach, whch s more effcent than performng j 1 multplcatons, even for small m 1. Ths approach s depcted n Algorthm 6. One mportant drawback of the above algorthm s that t s not constant tme, due to the f branchng attached to the condton k (1) = 0. Example 2. We present the computaton of g k mod p usng Algorthm 6, we take B = {11, 8} (.e. m 0 = 11, m 1 = 8). In terms of storage, one computes the values T [][j] = g R j m 0 +m 1 R mod p wth 0 l 1. One has the values {33, 1, 57, 25, 81, 49, 17, 73, 41, 9, 65} for j m 0 + m 1 R when 0 j < 11. Ths leads to T [][0..10] = {g 88 33, g 88, g 88 57, g 88 25, g 88 81, g 88 49, g 88 17, g 88 73, g 88 41, g 88 9, g 88 65 }. The trace of Algorthm 6 for the computaton of g k and k = 936192 usng the recodng obtaned n Example 1 s provded n Table I. C. Complexty For the amount of precomputed data, one can notce that t s equal to (m 0 + 1) l + 1 elements. The complexty of onlne computaton n Algorthm 6 s evaluated step by step n Table III for the average case. The number of multplcatons (M) s evaluated as follows: The costs of Steps 6 to 15 follow drectly from Algorthm 6 and are detaled n Table III. The frst squarng n Step 18 skpped snce X = 1, leadng to a cost of W 1 squarngs. The multplcatons n Steps 21 and 22 are performed only n case of Y j 1. Ths means that n the worst case we save the frst multplcaton whch s an affectaton : ths s the case consdered n Table III. For the sake of smplcty, we denote by H the sum of the j Hammng weghts for each j from m 1 1 downto 1 (for loop n Step 1ç). The value of H s shown n Table II for dfferent practcal values of m 1.

12 Algorthm 6 Fxed-base exponentaton wth multplcatve splttng wth radx R = m 0 m 1 Requre: An RNS base {m 0, m 1}, a radx R = m 0m 1, the exponent k = l 1 =0 kr and {(k (0), k (1) ), 0 < l, (k l)} the m 0m 1 recodng of k and g G. Ensure: A = g k 1: Offlne precomputaton. Store T [][j] g R j m 0 +m 1 R wth 0 < l, 0 j < m 0, T [][ 1] g R, 0 l 2: X 1, Y j 1 for 0 j < m 1 3: for from 0 to l 1 do 4: f k (1) = 0 then 5: f Y 0 = 1 then 6: Y 0 T [][k (0) ] T [][ 1] 7: else 8: Y 0 Y 0 T [][k (0) ] T [][ 1] 9: else 10: f Y k (1) 11: Y (1) k 12: else = 1 then T [][k (0) ] 13: Y (1) k Y (1) k 14: f k l 0 then 15: Y k l Y k l T [l][ 1] 16: W sze of m 1 n bts T [][k (0) ] 17: for from W 1 downto 0 do 18: X X 2 19: for j from m 1 1 downto 1 do 20: f bt of j s non zero then 21: X X Y j 22: return (X Y 0) IV. FIXED BASE EXPONENTIATION WITH MULTIPLICATIVE SPLITTING WITH R PRIME In ths secton we present a novel recodng algorthm based on multplcatve splttng modulo R prme. We wll show that the resultng exponentaton algorthm can be made constant tme. A. Dgt multplcatve splttng for prme radx R We present n ths subsecton a varant of the multplcatve splttng to the case of a prme radx R. When R s a prme we can use a multplcatve splttng modulo R based on an extenson of the half-sze multplcatve splttng of [19]. Our goal s to get the followng splttng for a fxed bound 0 < c < R. k = k (0) (k (1) ) 1 mod R wth k (0) < c k (1) R/c (8)

13 Table I EXAMPLE OF AN EXECUTION TRACE FOR AN EXPONENTIATION BASED ON MULTIPLICATIVE SPLITTING RECODING WITH R = m 0 m 1 Iter. Exp. coef. Step Value (loop 3:) = 0 = 1 = 2 = 3 k (0) 0 = 5 k (1) 0 = 0 k (0) 1 = 2 k (1) 1 = 6 k (0) 2 = 8 k (1) 2 = 5 k (0) 3 = 3 k (1) 3 = 7 6: 11: 11: 11: - - 15: - - 17: to 22: Y 0 T [0][k (0) 0 ] T [0][ 1] = g 49 g 1 = g 48 Y 6 T [1][k (0) 1 ] = g 88 57 = g 5016 Y 5 T [2][k (0) 2 ] = g 882 41 = g 317504 Y 7 T [3][k (0) 3 ] = g 883 25 = g 17036800 T 2 T [4][ 1] = g 884 ( 1) = g 59969536 g k = Y 0 m 1 1 j=1 Y j j = g 48 g 2 ( 59969536) = g 936192 g 5 317504 g 6 5016 g 7 17036800 Table II HAMMING WEIGHTS ACCOUNT FOR 0 j < m 1 m 1 2 3 4 5 6 7 8 9 H 1 2 4 5 7 9 12 13 1) Multplcatve splttng modulo a prme R: The multplcatve splttng modulo a prme radx R s based on the extended Eucldean algorthm. We brefly revew ths algorthm. We consder a prme nteger R and 0 < k < R. Then k and R are parwse prme gcd(k, R) = 1. The Eucldean algorthm computes gcd(k, R) through a sequence of modular reductons: r 0 = R, r 1 = k, r 2 = r 0 mod r 1,......, r j+1 = r j 1 mod r j,... The sequence of remanders r j satsfes gcd(r j, r j+1 ) = gcd(r, k) and s strctly decreasng and thus reaches 0 after some teratons. The last r l 0 satsfes r l = gcd(k, R) = 1. The extended Eucldean algorthm computes a Bezout relaton ur + vk = gcd(k, R)

14 Table III COMPLEXITY OF EXPONENTIATION BASED ON MULTIPLICATIVE SPLITTING RECODING WITH R = m 0 m 1 Complexty Step Operaton Cost 1 Step 6 T [][k (0) ] T [][ 1] 1 M (l/m 1 1) Step 8 Y 0 T [][k (0) ] T [][ 1] 2 M (m 1 1) Step 11 - - (l m 1 1 m 1 (m 1 1)) Step 13 Y k (1) T [][k (0) ] 1 M 1 Step 15 Y k l T [l][ 1] 1 M (W 1) Step 18 X X 2 1 S (H 1) Step 21 X Y j 1 M 1 Step 22 (X Y 0) 1 M TOTAL (l m 1 +1 m m 1 + H + 1) M +(W 1) S 1 TOTAL (m 0 + 1) l + 1 elements of G STORAGE by mantanng two sequences of ntegers u j and v j satsfyng: The sequence v j u j R + v j k = r j, for j = 0, 1,..., l. (9) s an ncreasng sequence n magntude startng from v 0 = 0 and v 1 = 1. The multplcatve splttng of (8) can then be obtaned from (9) where we take j such that r j [0, c[ and v j [0, R/c] and by takng k (0) = r j and k (1) = v j. The followng lemma establshes ths property. Lemma 1. If one chooses c [0, R[, there exsts j such that r j c and r j+1 < c and at the same tme v j R/c and v j+1 R/c. The proof of the lemma s gven n the appendx. Ths leads to the method shown n Algorthm 7 for multplcatve splttng modulo a prme radx R. In ths algorthm a thrd varable s s used for the sgn of the multplcatve splttng. 2) Recodng the exponent: We now present our recodng approach for an nteger k gven n radx-r representaton: l 1 k = k R, wth l = t/ log 2 (R). =0 We choose a splttng bound c and we consder a dgt k 0. Usng Algorthm 7 we get s, k (0) and k (1) such that k = s k (0) (k (1) ) 1 mod R wth s { 1, 1} k (0) k (1) [0, c[, [0, R/c]. (10)

15 Algorthm 7 Truncated Extended Eucldean Algorthm (TruncatedEEA(k, R, c)) Requre: k Z, the prme radx R, and c, the upper bound for k (1). Ensure: (s, k (0), k (1) ), such as k = s k (0) (k (1) ) 1 R wth 0 k (0) < c and 0 k (1) R/c and s { 1, 1} when gcd(k, R) = 1. 1: f gcd(k, R) = R then 2: return (1, 0, 0) 3: else 4: u 0 1, v 0 0, r 0 R, u 1 0, v 1 1, r 1 k R 5: whle (r 1 c) do 6: q r 0/r 1, r 2 r 0 r1 7: u 2 u 0 q u 1, v 2 v 0 q v 1 8: (u 0, v 0, r 0) (u 1, v 1, r 1) 9: (u 1, v 1, r 1) (u 2, v 2, r 2) 10: s sgn(v 1), k (0) r 1, k (1) v 1 11: return (s, k (0), k (1) ) We put apart the case k = 0 whch s recoded as (1, 0, 0) (cf. Step 2 of Algorthm 7). We handle the reducton modulo R as follows: C = (s k (0) k = s k (0) (k (1) (k (1) ) 1 R k )/R (exact quotent), ) 1 R CR. One notces that C satsfes c C < c. We then consder C as a carry that we subtract to k +1. We obtan an expresson k = l =0 k R of k n radx R such that each dgt k = s k (0) (k (1) ) 1 R s gven n a multplcatve splttng form. The last coeffcent k l = C s necessary to handle the last carry. The resultng recodng algorthm s shown n Algorthm 8. Algorthm 8 Multplcatve Splttng Recodng for R Prme Requre: R prme, k = l 1 =0 kr, and c the splttng bound. Ensure: {(s, k (0), k (1) ), 0 < l, (k l)} the multplcatve splttng recodng of k. 1: C 0 2: for from 0 to l 1 do 3: k k C 4: s, k (0), k (1) TruncatedEEA(k, R, c). 5: C (s k (0) (k (1) ) 1 R k )/R //exact quotent 6: return {(s, k (0), k (1) ), 0 < l, (k l = C)} Example 3. We present an example of multplcatve splttng recodng for a prme radx R = 89 wth an exponent sze t of 20 bts (0 < k < 2 20 ). In ths case, one has l = 20/ log 2 (89) = 4. One also sets c = 2 3 = 8, and then, R/c = 12. Let us take k = 901644 10, the random exponent. By rewrtng k n radx-r, one has

16 k = 74 + 73 89 + 24 89 2 + 1 89 3. The executon trace of Algorthm 8 s provded n Table IV. Table IV EXAMPLE OF AN EXECUTION TRACE OF ALGORITHM 8 Iter. Step Value = 0 = 1 = 2 = 3 3: k 0 = 74 does not change snce C = 0 4: s 0 = 1, k (0) 0 = 1, k (1) 0 = 6. 5: C (s 0 k (0) 0 (k (1) 0 ) 1 R k 0 )/R = 1 3: k 1 73 + 1 = 74 snce C = 1 4: s 1 = 1, k (0) 1 = 1, k (1) 1 = 6. 5: C (s 1 k (0) 1 (k (1) 1 ) 1 R k 1 )/R = 1 3: k 2 24 + 1 = 25 snce C = 1 4: s 2 = 1, k (0) 2 = 3, k (1) 2 = 7. 5: C (s 2 k (0) 2 (k (1) 2 ) 1 R k 2 )/R = 2 3: k 3 1 + 2 = 3 snce C = 2 4: s 3 = 1, k (0) 3 = 3, k (1) 3 = 1. 5: C (s 3 k (0) 3 (k (1) 3 ) 1 R k 3 )/R = 0 (( 1, 1, 6), ( 1, 1, 6), ( 1, 3, 7), (1, 3, 1)) and k 4 = C = 0 B. Exponentaton Algorthm wth multplcatve splttng recodng n a prme radx R We now present an exponentaton algorthm whch takes advantage of the exponent recodng gven n Secton IV-A2. One wants to compute wth g k = g l =0 k R = g k l Rl l 1 =0 gk R (11) g k R = g s k(0) (k (1) ) 1 R R, f k (1) 0, g k R = 1, f k (1) = 0(ths corresponds to k = 0). In order to compute the fxed-base exponentaton g k mod p, one stores the followng values: 0 l 1, T [][s][j] = g R s j 1 R, wth 1 j R/c, s { 1, 1}. T [][s][0] = 1 wth s { 1, 1}. T [l][s] = g srl wth s { 1, 1}.

17 One denotes Y j the product of the terms g s (k(1) ) 1 R R such that of k (0) = j. Ths means that for j k l and for j = k l one has Y j = Y j = k (0) =j k (0) =j T [][s ][k (1). ] T [][s ][k (1) T [l][sgn(k l)]. We can then rewrte the products n (11) n terms of Y j as follows: Every ndvdual exponentaton Y j j g k = ] j {1,...,c 1} Y j j. s performed wth a square-and-multply approach, whch s more effcent than performng j 1 multplcatons, even for small c. Ths fnally leads to the exponentaton shown n Algorthm 9. Algorthm 9 Fxed-base exponentaton wth multplcatve splttng for prme radx R Requre: R a prme nteger, an exponent k = l 1 =0 kr and {(s, k (0), k (1) ), 0 < l, k l} the multplcatve splttng recodng n radx R of k and g G. Ensure: X = g k 1: Offlne precomputaton. For 0 l 1, 1 j R/c, s { 1, 1} store T [][s][j] g R s j 1 R and T [][s][0] 1 for 0 l 1, s { 1, 1} and T [l][s] g srl for s { 1, 1}. 2: X 1, Y j 1 for 0 j c 3: for from 0 to l 1 do Y k (0) T [][s ][k (1) ] 4: Y (0) k 5: Y k l Y k l T [l][sgn(k l)] 6: W sze of c n bts 7: for from W 1 downto 0 do 8: X X 2 9: for j from c 1 downto 1 do 10: f bt of j s non zero then 11: X X Y j 12: return (X) The above algorthm can be mplemented n a constant tme fashon. Indeed there s no f control attached to the dgts of the exponent. Then, the algorthm conssts n a constant and regular sequence of multplcatons and squarngs as soon as a multplcaton wth a 1 s computed as any other multplcaton. Example 4. We consder the exponent k = 901644 10 along wth the multplcatve splttng recodng computed n Example 3. (( 1, 1, 6), ( 1, 1, 6), ( 1, 3, 7), (1, 3, 1)) and k 4 = 0. (12)

18 We present the computaton of g k usng Algorthm 9. In terms of storage, one computes the values 0 l 1, T [][s][j] = g R s j 1 R wth 1 j R/c = 12, s { 1, 1}. One has the followng values of j 1 R for 1 j 12 {1, 45, 30, 67, 18, 15, 51, 78, 10, 9, 81, 52}. Ths brngs us to store the followng values n G: T [][1] = {g 89, g 89 45, g 89 30, g 89 67, g 89 18, g 89 15, g 89 51, g 89 78, g 89 10, g 89 9, g 89 81, g 89 52 } T [][ 1] = {g 89, g 89 45, g 89 30, g 89 67, g 89 18, g 89 15, g 89 51, g 89 78, g 89 10, g 89 9, g 89 81, g 89 52 }. The executon of Algorthm 9 s shown step by step n Table V Table V EXAMPLE OF AN EXECUTION TRACE FOR AN EXPONENTIATION BASED ON MULTIPLICATIVE SPLITTING RECODING WITH R PRIME Iter. Step Coeff Value = 0 4: = 1 4: = 2 4: = 3 4: s 0 = 1 k (0) 0 = 1 k (0) 0 = 6 s 1 = 1 k (0) 1 = 1 k (1) 1 = 6 s 2 = 1 k (0) 2 = 3 k (1) 2 = 7 s 3 = 1 k (0) 3 = 3 k (1) 3 = 1 Y 1 Y 1 T [0][s 0 ][k (1) 0 ] = 1 g 15 Y 1 Y 1 T [1][s 1 ][k (1) 1 ] = g 15 g 89 15 = g 1350 Y 3 Y 3 T [2][s 3 ][k (1) 2 ] = 1 g 892 51 = g 403971 Y 3 Y 3 T [3][s 3 ][k (1) 3 ] = g 403971 g 893 1 = g 300998-5: k 4 = 0 Y 0 Y 0 T [l][sgn(k 4 )] = g59969536 7: g k = c 1 j=1 Y j j - to 11: - = g 3 300998 1350 = g 901644 C. Complexty Let us now evaluate the complexty of Algorthm 9. Concernng the amount of storage t conssts n 2( R/c +1)l + 2 elements of G.

19 For the onlne complexty, we evaluate the cost of each step of Algorthm 9 based on the followng: the multplcatons n Step 4 are performed even n case of Y (0) k the computaton; the same apples for Step 5. The number of operatons n the fnal reconstructon s evaluated as follows: the squarng n Step 8 s not performed n the frst loop teraton (X = 1); = 1, n order to ensure the constant tme of Ths frst multplcaton n Step 11 s skpped snce t s an affectaton. The other multplcatons n Step 11 are performed even n case of Y j = 1, agan to ensure a constant computaton tme. We denote by H the sum of the j Hammng weghts for each j from c 1 downto 1 (for loop n Step 7). The value of H s as follows for the dfferent values of c can be found n Table II. The contrbuton of each step s gven n Table VI along wth the total complexty. Table VI EXPONENTIATION COMPLEXITY AND STORAGE FOR THE PROPOSED APPROACH WITH A PRIME RADIX R RECODING. Complexty Step Operaton Complexty l Step 4 Y (0) k T [][s ][k (1) ] 1 M 1 Step 5 Y k l T [l][sgn(k l)] 1 M (W 1) Step 12 X 2 1 S (H 1) Step 15 X Y j 1 M TOTAL TOTAL STORAGE (l + H) M +(W 1) S 2( R/c + 1)l + 2 elements of G V. COMPLEXITY AND EXPERIMENTATION COMPARISON A. Complexty comparson In Table VII we gve the complextes n terms of the number of onlne operatons and storage amount of the state of the art approaches (Secton II) and the two proposed approaches n Secton III and IV. All the approaches presented n the above table can be mplemented n constant tme except the Square-and-multply, Fxed base NAF w and the proposed approach wth R = m 0 m 1. Let us frst see when the Fxed-base Comb method s better than the Fxed-base Radx-R exponentaton. We denote w C the wndow sze of the Comb method and w R the one of the Radx-R method. In order to have both methods wth the same number of onlne operatons n G, we take w C = 2w R : n ths case, both methods requre t/w R onlne operatons n G. Then, consderng the storage amount when w C = 2w R, one can see that the Comb method requres 2 2w R t whle the Radx-R method needs w R 2 w R elements of G. In other words, for a fxed number

20 Table VII COMPLEXITIES AND STORAGE AMOUNTS OF EXPONENTIATION ALGORITHM, AVERAGE CASE, BINARY EXPONENT LENGTH t. Square-and-mult. (Algo. 1) Fxed-base Radx-R ( ) (Algo. 2) Fxed-base Comb (Algo. 3) Fxed base NAF w (Algo. 3) Proposed ( ) wth R = m 0 m 1 (Algo. 6) Proposed ( ) wth R prme (Algo. 9) ( ) We assume that R s a w bt nteger Constant #Mul #Squ. Storage tme no yes yes no no yes t w (#values n G) t t 1 0 2 t w 1 0 t w 2w t w 1 t ( ) w 1 2w t w (1 w w) w+1 (e 1)w t e 2w 2 m 1 +1 m 1 m 1 + H + 1 W 1 (2 w /m 1 + 1) t w + 1 t w + H W 1 (2w+1 /c + 1) t w + 1 t/w R of onlne computaton, the Comb method s better than the Radx-R as soon as 2 w R < t w R for small w R,.e., for small amount of storage. whch s the case If we now consder the Fxed base NAF w, we can notce that t does not compare favorably wth the radx-r approach. Indeed for e = 1 we would have almost the same number of onlne multplcatons whereas the amount of data n the NAF w s larger by a factor of w. For larger value of e the number of squarngs would ncrease quckly renderng the approach not compettve. Moreover the Fxed base NAF w has the major drawback to not be constant tme. It s more dffcult to formally compare the proposed approaches wth the Comb and Radx-R approaches. Indeed, they nvolve a thrd parameter (c or m 1 ), whch means that for a fxed number of onlne operatons, we would have to fnd the proper parameter whch mnmzes the amount of storage. We can stll notce that for a gven c (resp. m 1 ) we dvde by c (resp. m 1 ) the amount of storage compared to the Radx-R approach whle havng an ncrease of onlne computaton (H and W ). Ths means that the proposed approaches can be competve only for small c and m 1. To have a clearer dea of the mpact of the proposed approach so we follow the strategy used n [17]. Indeed, for practcal szes of group and exponent and for dfferent level of onlne operatons, we evaluate the best choce of parameters whch mnmzes the amount of precomputaton. In the sequel we gve the results for DSA and ECDSA, for the felds and curves recommended by the NIST. B. Complextes and tmngs for modular exponentaton In ths subsecton we focus on exponentaton n ((Z/pZ), ) used n DSA. We evaluate and compare the complextes of the best method of the lterature,.e., Fxed-base Comb (Algorthm 3) and Fxed-base Radx-R

21 (Algorthm 2), wth the complexty of our proposed approaches based on a multplcatve splttng recodng of the exponent (Algorthm 6 for R = m 0 m 1 and Algorthm 9 for R prme). In the sequel of ths subsecton, we provde complexty evaluatons n terms of modular multplcatons MM, under the assumpton of modular squarng MS = 0.86 MM, whch s the average value of our mplementatons for the NIST DSA recommended feld szes. We warn the reader to keep n mnd that the Fxed-base Comb, Radx-R and Algorthm 9 are constant tme, and that Algorthm 6 s not,.e., the only one weak aganst tmng attacks. The NIST provdes recommended key szes and correspondng feld szes (respectvely the sze of the prmes q and p, see NIST SP800-57 [4]). Ths standardzed szes are as follows: Table VIII NIST RECOMMENDED KEY AND FIELD SIZES Securty level 80 112 128 192 256 Key sze (bts) 160 224 256 384 512 Feld sze (bts) 1024 2048 3072 7680 15360 Fg. 1 gves the general behavor of the four algorthms n terms of storage (y axs) wth respect to the number of onlne operatons (x axs). In the fgure, we present three of the feld szes recommended n the NIST standards (see [4]) and the behavor s roughly the same for all szes, although the beneft of our approach wth R = m 0 m 1 s lower for smaller szes. One can see that the Fxed-base Comb method s the best for small storage amount. Our m 0 m 1 approach (Algorthm 6) s better for larger amount of storage, however, the Fxed-base Radx-R method s the best when the storage s ncreasng. One can see that the R prme multplcatve splttng approach (Algorthm 9) s less effcent than the R = m 0 m 1 for small storage amounts. The reason s that ths requres some addtonal computatons to get a constant tme executon, whle the m 0 m 1 approach s not constant tme and s thus slghtly more effcent. Nevertheless, one can see a range of storage/complexty trades-off where the R prme multplcatve splttng approach s the best of the constant-tme ones. Table IX shows numercal applcaton of the complexty comparson between the Fxed-base Comb (Algorthm 3), the Fxed-base Radx-R (Algorthm 2) and the approaches based on our multplcatve splttng recodngs (Algorthm 6 and Algorthm 9). For an equvalent number of MMs, we provde the mnmum amount of storage. We can notce the followng: For all key szes, we do not provde the results for small amount of storage (values for w < 8). For such storage, the Fxed-base Comb method s the best. One may notce that the Fxed-base Radx-R approach nvolves the largest storage amount at ths complexty level. Comparson of the two proposed approaches: R = m 0 m 1 vs R prme. We would lke to evaluate the mprovements provded by the new approach (Algorthm 9) compared to (Algorthm 6) whch was presented at

22 Fgure 1. 15360 bts). Complexty comparson, Fxed base modular exponentaton NIST DSA, key sze 256, 384 and 512 bts (feld sze 3072, 7360 and Requred Total Storage #kbytes 1e+07 1e+06 100000 10000 1000 100 Complexty Comparson Best of Average Case, t=256 FxedBaseComb radx R m0m1 R-prme 10 10 20 30 40 50 60 70 number of feld multplcatons Requred Total Storage #kbytes 1e+07 1e+06 100000 10000 1000 Complexty Comparson Best of Average Case, t=384 FxedBaseComb radx R m0m1 R-prme 100 20 30 40 50 60 70 80 90 100 number of feld multplcatons Requred Total Storage #kbytes 1e+07 1e+06 100000 10000 Complexty Comparson Best of Average Case, t=512 FxedBaseComb radx R m0m1 R-prme 1000 20 40 60 80 100 120 number of feld multplcatons WAIFI 2016. The results n Table IX show that the exponentaton wth multplcatve splttng wth R = m 0 m 1 and R prme are close from each other. But the approach wth R = m 0 m 1 s generally slghtly better than the one wth R prme. But, as notced earler, ths s the prce to pay to get a constant-tme algorthm. Comparson of constant tme approaches. We consder the Fxed-base Comb, Radx-R and multplcatve splttng wth R prme approaches. A thorough analyss of the complextes shows that the proposed approach s nterestng for ntermedate level of onlne computaton. Specfcally from Table IX, for a 224 bt key sze, one notces that there are not many cases where the proposed multplcatve splttng approach s nterestng. However, for the other key szes t = 256, 384 and 512, one can see a lot of cases where the amount of storage

23 s reduced by 50% compared to Comb and Radx-R approaches. Remark 1. One may notce that the largest memory storage szes exceed the common values of Random Access Memory, and n some cases, the maxmum allowed for the malloc functon of the standard C lbrary for memory allocaton. Nevertheless, the storage savngs proposed by our method and Fxed-base Radx-R ones allow to keep the level of storage under the lmt for lower complextes. Table IX STORAGE AMOUNT COMPARISON FOR FIXED-BASE COMB, FIXED-BASE RADIX-R AND MODULAR EXPONENTIATION WITH MULTIPLICATIVE SPLITTING RECODING FOR NIST RECOMMENDED EXPONENT SIZES Key sze t = 224 bts Key sze t = 256 bts #MM Fxed-base Fxed-base Multplcatve splttng Comb Radx-R R = m 0m 1 R-prme #MM Fxed-base Fxed-base Multplcatve splttng Comb Radx-R R = m 0m 1 R-prme 45 127.5 kb 345 kb 108 kb 240 kb w = 9 R = 31 (m 0, m 1) = (R, c) = 46 383 kb 845 kb 241 kb 494 kb w = 10 R = 47 (m 0, m 1) = (R, c) = (11, 9) (97, 7) (17, 11) (97, 5) 37 511.5 kb 594 kb 242 kb 541 kb w = 11 R = 61 (31, 7) (179, 5) 39 1535 kb 1454 kb 579 kb 1116 kb w = 12 R = 97 47; 7 223; 5 30 4095.5 kb 1386 kb 770 kb 1205 kb w = 14 R = 179 (127, 7) (179, 5) 32 12287 kb 3179 kb 2070 kb 3084 kb w = 15 R = 257 211; 6 409; 3 24 32767.5 kb 4230 kb 4173 kb 4489 kb w = 17 R = 677 (877, 7) (1223, 3) 26 98303 kb 9846 kb 9642 kb 10207 kb w = 18 R = 937 1223; 6 1699; 3 19 524287.5 kb 27084 kb 50409 kb 27954 kb w = 21 R = 5417 (13441, 5) (6211, 2) 20 1572863 kb 66676 kb 225482 kb 85558 kb w = 22 R = 8467 37579; 5 12007; 2 Key sze t = 384 bts Key sze t = 512 bts #MM Fxed-base Fxed-base Multplcatve splttng Comb Radx-R R = m 0m 1 R-prme #MM Fxed-base Fxed-base Multplcatve splttng Comb Radx-R R = m 0m 1 R-prme 63 1918 kb 4081 kb 969 kb 2274 kb w = 11 R = 67 (m 0, m 1) = (R, c) = 86 3836 kb 9841 kb 1940 kb 5004 kb w = 11 R = 59 (m 0, m 1) = (R, c) = (19, 11) (127, 6) (13, 11) (163, 9) 50 15358 kb 10087 kb 3742 kb 7182 kb w = 14 R = 191 101; 11 433; 5 73 15356 kb 17855 kb 4747 kb 10005 kb w = 13 R = 127 (41, 10) (241, 6) 41 122878 kb 26655 kb 17284 kb 22891 kb w = 17 R = 677 541; 6 937; 3 60 122876 kb 46775 kb 16224 kb 29979 kb w = 16 R = 409 (179, 11) (739, 5) 35 983038 kb 80357 kb 64768 kb 65837 kb w = 20 R = 2381 2381; 6 3191; 3 52 491516 kb 93110 kb 54680 kb 76505 kb w = 18 R = 937 (677, 7) (1223, 3) 30 7864318 kb 246070 kb 315053 kb 235255 kb w = 23 R = 8467 13441; 5 13441; 3 48 983036 kb 156091 kb 106185 kb 136971 kb w = 19 R = 1699 (1489, 10) (2381, 3) 26 62914558 kb 951217 kb 3256278 kb 1030642 kb w = 26 R = 37579 165397; 5 43973; 2 41 7864316 kb 489112 kb 355573 kb 477551 kb w = 22 R = 6211 (5417, 7) (6211, 2) 24 503316478 kb 1750756 kb - kb - kb w = 29 R = 74699 35 62914556 kb 2048419 kb 2113890 kb 1949934 kb w = 25 R = 30347 (37579, 7) (47269, 3) 1) Implementaton results: Implementaton strateges. We revew hereafter the man mplementaton strateges and test process for modular exponentaton for NIST recommended szes. Ths apples for the four consdered

24 exponentaton algorthms. The algorthms were coded n C, compled wth gcc 4.8.3 and run on the same platform. Mult-precson multplcaton and squarng. We used the low level functons performng mult-precson multplcaton and squarng of the GMP lbrary as buldng blocks of our codes (GMP 6.0.0, see GMP lbrary [1]). Accordng to the GMP documentaton, the classcal schoolbook algorthm s used for small szes, and Karatsuba and Toom-Cook subquadratc methods for sze 2048 bts. Modular reducton. Ths operaton mplements the Montgomery representaton and modular reducton method, whch avod mult-precson dvson n the computaton of the modular reducton. Ths approach was presented by Montgomery n [18]. We use the block Montgomery algorthm suggested by Bosselaers et al. n [5]. In ths algorthm, the mult-precson operatons combne full sze operand wth one word operand and are also avalable n the GMP lbrary [1]. Multplcatve splttng recodng wth R = m 0 m 1 and R prme. The converson n radx-r needs multprecson dvsons. These operatons are mplemented usng the GMP lbrary [1]. The sze of these operatons s decreasng along the algorthm, and ths s managed through GMP. The other operatons are classcal long nteger operatons. At Step 11 n Algorthm 5 (resp. Step 5 n Algorthm 8), an nverson modulo m 0 (resp. R) s requred. Ths operaton s performed usng the Extended Eucldean Algorthm, over long nteger data. For the consdered exponent szes, the cost of the recodng s neglgble. Ths s explaned by the small sze of the exponent n comparson wth the sze of the data processed durng the modular exponentaton (see Table VIII). The tmngs gven n the next subsecton nclude ths recodng. Test processng. The tests nvolve a few hundred datasets, whch consst of random exponent nputs and an exponentaton base wth the precomputed stored values. We compute 2000 tmes the correspondng exponentaton for each dataset and keep the mnmum number of clock cycles. Ths avods the cold-cache effect and system ssues. The tmngs are obtaned by averagng the tmngs of all datasets. Tests results and comparson. The four consdered exponentaton algorthms were coded n C, compled wth gcc 4.8.3 and run on the followng platform: the CPU s an Intel XEON R E5-2650 (Ivy brdge), and the operatng system s CENTOS 7.0.1406. On ths platform, the Random Access Memory s 12.6 GBytes. One notces that the performance results nclude the Radx-R recodng and the multplcatve splttng of the dgts for R = m 0, m 1 and R prme. We show the performance results n Fg. 2 whch gves a global overvew. The mplementaton results confrm the complexty evaluaton, for key szes of 224, 256, 384, and 512 bts. However, the best results are for 384 and 512 bts. In Table X, we provde the most sgnfcant results. The gans shown are roughly n the same order of magntude as the one of the complexty evaluaton. In partcular, for the largest key sze (512 bts), the storage of our approach wth R = m 0 m 1 s nearly ten tmes less than the one requred wth the Fxed-base Comb method, and nearly 14% less than the one requred for the Fxed-base Radx-R method, for the same level of clock-cycles. In the same tme,

25 Table X IMPLEMENTATION RESULTS FOR MODULAR EXPONENTIATION IN TERMS OF CLOCK CYCLES AND STORAGE (KB). TEST PERFORMED ON AN INTEL XEON E5-2650 (IVY BRIDGE), GCC 4.8.3, CENTOS 7.0.1406. Scalar multplcaton State of the art methods Proposed approach Securty Level of Fxed-base Comb Radx R R = m 0m 1 R prme level clock- Tme Storage w Tme Storage R Tme Storage (m 0, m 1) Tme Storage (R, c) -cycles (#CC) (kb) (#CC) (kb) (#CC) (kb) (#CC) (kb) 112 bts (key 224 bts, feld 2048 bts) 128 bts (key 256 bts, feld 3072 bts) 192 bts (key 384 bts, feld 7680 bts) 256 bts (key 512 bts, feld 15360 bts) 220000 221108 1023.5 12 227838 829 91 219864 580 (89,6) 217104 1191 (257,3) 207000 210074 2047.5 13 206888 1324 163 207072 766 (127,7) 206813 1553 (347,3) 148000 149690 65535 18 147877 7289 1223 146156 21599 (5417,6) 149490 17661 (3719,2) 505000 524539 1535 12 502981 1411 91 501466 897 (79,6) 509581 1420 (307, 5) 450000 449397 6143 14 445871 2251 163 446444 2056 (211,6) 458936 2372 (307, 3) 354000 356892 98303 18 354640 6414 571 354071 12843 (1721,7) 353662 15283 (1699, 2) 444000 4442590 1918 11 4492191 3430 53 4409584 1134 (23, 10) 4494471 2171 (127, 6) 353000 3554339 15358 14 3524896 8290 163 3551437 4164 (113, 10) 3534620 7100 (433, 5) 270000 2736341 245758 18 2543480 45221 1223 2743399 29961 (1031, 7) 2795363 31915 (1381, 3) 1860000 18632429 15536 13 19260731 13765 91 18550238 4745 (41, 10) 18683547 8653 (257, 7) 1500000 14848261 122876 16 15401002 34418 163 14812616 22111 (257, 11) 15541482 27853 (641, 5) 1240000 12477816 983036 19 12193232 119061 1223 12499600 102820 (1381, 7) 12802926 101886 (1699, 3) our approach wth R prme gves equvalent results for low levels of storage, and better results for hgher levels of storage. C. Complextes and tmngs for scalar multplcaton In ths subsecton, we present complexty results and tmngs of the fxed base scalar multplcaton over ellptc curves recommended by NIST. 1) Complexty comparson: In the fxed-base ellptc curve scalar multplcaton case, the man dfference wth the modular exponentaton s the neglgble cost of the nverson of a group element (.e. an ellptc curve pont). Ths allows to half the memory requrements, by only storng the ponts correspondng of the postve sgn s n the recoded coeffcents. We provde n appendx A the verson of the scalar multplcaton algorthm wth multplcatve splttng wth R prme whch takes advantage of a cheap pont subtracton. When computng the complextes, we notced that the approach usng a multplcatve splttng recodng wth R = m 0 m 1 was never better than the one wth R prme. In addton, the approach wth R = m 0 m 1 does not provde a constant tme computaton. That s why we do not consder the approach wth R = m 0 m 1 n remander of ths subsecton. Specfcally, we only deal wth constant-tme approaches: Fxed-base Comb, Radx-R and multplcatve splttng wth R prme. We compare explct complextes for practcal stuatons, whch are the three ellptc curves standardzed by NIST: P256, P384, P521. One can fnd n [12] the Weerstrass curve equatons of these three NIST curves whch are revewed n the appendx. For the arthmetc on these curves, we use the Jacoban coordnate system, whch provdes the fastest curve operatons. We use the complextes n terms of operatons n F p of pont addton and