Fast Variants of RSA

Similar documents
Cryptanalysis of pairing-free certificateless authenticated key agreement protocol

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Provable Security Signatures

Cryptanalysis of a Public-key Cryptosystem Using Lattice Basis Reduction Algorithm

Finding Primitive Roots Pseudo-Deterministically

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

CHALMERS GÖTEBORGS UNIVERSITET. TDA352 (Chalmers) - DIT250 (GU) 12 Jan. 2017, 14:00-18:00

Lecture 4: Universal Hash Functions/Streaming Cont d

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Attacks on RSA The Rabin Cryptosystem Semantic Security of RSA Cryptology, Tuesday, February 27th, 2007 Nils Andersen. Complexity Theoretic Reduction

A Threshold Digital Signature Issuing Scheme without Secret Communication

PRIME NUMBER GENERATION BASED ON POCKLINGTON S THEOREM

RSA /2002/13(08) , ); , ) RSA RSA : RSA RSA [2] , [1,4]

Errors for Linear Systems

Algorithms for factoring

Use of Sparse and/or Complex Exponents in Batch Verification of Exponentiations

Comments on a secure dynamic ID-based remote user authentication scheme for multiserver environment using smart cards

Hardening the ElGamal Cryptosystem in the Setting of the Second Group of Units

An efficient algorithm for multivariate Maclaurin Newton transformation

The Improved Montgomery Scalar Multiplication Algorithm with DPA Resistance Yanqi Xu, Lin Chen, Moran Li

A Novel Feistel Cipher Involving a Bunch of Keys supplemented with Modular Arithmetic Addition

Advanced Algebraic Algorithms on Integers and Polynomials

DISCRIMINANTS AND RAMIFIED PRIMES. 1. Introduction A prime number p is said to be ramified in a number field K if the prime ideal factorization

Finding Malleability in NTRUSign

Cryptanalysis of Some Double-Block-Length Hash Modes of Block Ciphers with n-bit Block and n-bit Key

Min Cut, Fast Cut, Polynomial Identities

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Order Relation and Trace Inequalities for. Hermitian Operators

Math Review. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

NEW CONSTRUCTIONS IN LINEAR CRYPTANALYSIS OF BLOCK CIPHERS

Numerical Heat and Mass Transfer

Security Level of Cryptography Integer Factoring Problem (Factoring N = p 2 q) December Summary 2

Message modification, neutral bits and boomerangs

Notes on Frequency Estimation in Data Streams

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

On the correction of the h-index for career length

Formulas for the Determinant

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Cryptanalysis of Threshold Proxy Signature Schemes 1)

Introduction to Algorithms

Efficient Fixed Base Exponentiation and Scalar Multiplication based on a Multiplicative Splitting Exponent Recoding

Secure and practical identity-based encryption

There are two approaches to Hensel lftng. Lnear lftng starts wth polynomals f = f (0) and teratvely constructs polynomals f () such that ()f () f (?)

Recover plaintext attack to block ciphers

Grover s Algorithm + Quantum Zeno Effect + Vaidman

THE SUMMATION NOTATION Ʃ

The internal structure of natural numbers and one method for the definition of large prime numbers

Singular Value Decomposition: Theory and Applications

Practical and Secure Solutions for Integer Comparison

1 GSW Iterative Techniques for y = Ax

arxiv: v1 [cs.cr] 22 Oct 2018

Lecture Space-Bounded Derandomization

Performance Analysis of the Postcomputation- Based Generic-Point Parallel Scalar Multiplication Method

Section 3.6 Complex Zeros

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Problem Set 9 Solutions

Improved Integral Cryptanalysis of FOX Block Cipher 1

Practical and Secure Solutions for Integer Comparison (Extended Abstract)

Lecture 5 Decoding Binary BCH Codes

The Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL

Augmented Broadcaster Identity-based Broadcast Encryption

Bit-Parallel Word-Serial Multiplier in GF(2 233 ) and Its VLSI Implementation. Dr. M. Ahmadi

= z 20 z n. (k 20) + 4 z k = 4

Exploring Naccache-Stern Knapsack Encryption

18.1 Introduction and Recap

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm

A Model of Bilinear-Pairings Based Designated-Verifier Proxy Signatue Scheme*

Homework Assignment 3 Due in class, Thursday October 15

A property of the elementary symmetric functions

x = , so that calculated

Introduction to Algorithms

Impossible differential attacks on 4-round DES-like ciphers

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Finding Dense Subgraphs in G(n, 1/2)

Exercises. 18 Algorithms

Analysis and Design of Multiple Threshold Changeable Secret Sharing Schemes

a b a In case b 0, a being divisible by b is the same as to say that

Foundations of Arithmetic

VQ widely used in coding speech, image, and video

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

Uncertainty in measurements of power and energy on power networks

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

Online Classification: Perceptron and Winnow

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

Chapter Newton s Method

The stream cipher MICKEY

Attack on cascaded convolutional transducers cryptosystem

Nodal analysis of finite square resistive grids and the teaching effectiveness of students projects

Appendix B: Resampling Algorithms

A Robust Method for Calculating the Correlation Coefficient

Post-quantum Key Exchange Protocol Using High Dimensional Matrix

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Calculation of time complexity (3%)

Hiding data in images by simple LSB substitution

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Transcription:

Fast Varants of RSA Dan Boneh dabo@cs.stanford.edu Hovav Shacham hovav@cs.stanford.edu Abstract We survey four varants of RSA desgned to speed up RSA decrypton and sgnng. We only consder varants that are backwards compatble n the sense that a system usng one of these varants can nteroperate wth systems usng standard RSA. 1 Introducton RSA [12] s the most wdely deployed publc key cryptosystem. It s used for securng web traffc, e-mal, and some wreless devces. Snce RSA s based on arthmetc modulo large numbers t can be slow n constraned envronments. For example, 1024-bt RSA decrypton on a small handheld devce such as the PalmPlot III can take as long as 30 seconds. Smlarly, on a heavly loaded web server, RSA decrypton sgnfcantly reduces the number of SSL requests per second that the server can handle. Typcally, one mproves RSA s performance usng specal-purpose hardware. Current RSA coprocessors can perform as many as 10,000 RSA decryptons per second (usng a 1024-bt modulus) and even faster processors are comng out. In ths paper we survey four smple varants of RSA that are desgned to speed up RSA decrypton n software. We emphasze backwards compatblty: A system usng one of these varants for fast RSA decrypton should be able to nteroperate wth systems that are bult for standard RSA. Moreover, exstng Certfcate Authortes must be able to respond to a certfcate request for a varant-rsa publc key. We begn the paper wth a bref revew of RSA. We then descrbe the followng varants for speedng up RSA decrypton: Batch RSA [8]: do a number of RSA decryptons for approxmately the cost of one. Mult-factor RSA [7, 14]: use a a modulus of the form N = pqr or N = p 2 q. Rebalanced RSA [16]: speed up RSA decrypton by shftng most of the work to the encrypter. The securty of these varants s an open research problem. We cannot show that an attack on these varants would mply an attack on the standardzed verson of RSA (as descrbed, e.g., n ANSI X9.31). Therefore, when usng these varants, one can only rely on the fact that so far none of them has been shown to be weak. The RSA trapdoor permutaton s used for both publc key encrypton and dgtal sgnatures. Snce the exact applcaton of RSA s orthogonal to the dscusson n ths paper we use termnology consstent wth the applcaton to publc key encrypton. All the RSA varants we dscuss apply equally well to dgtal sgnatures, where they speed up RSA sgnng. 1

1.1 Revew of the basc RSA system We revew the basc RSA publc key system and refer to [10] for more nformaton. We descrbe three consttuent algorthms: key generaton, encrypton, and decrypton. Key generaton: The key generaton algorthm takes a securty parameter n as nput. Throughout the paper we use n = 1024 as the standard securty parameter. The algorthm generates two (n/2)-bt prmes, p and q, and sets N pq. Next, t pcks some small value e that s relatvely prme to ϕ(n) = (p 1)(q 1). The value e s called the encrypton exponent, and s usually chosen as e = 65537. The RSA publc key conssts of the two ntegers N, e. The RSA prvate key s an nteger d satsfyng e d = 1 mod ϕ(n). Typcally, one sends the publc key N, e to a Certfcate Authorty (CA) to obtan a certfcate for t. Encrypton: To encrypt a message X usng an RSA publc key N, e, one frst formats the btstrng X to obtan an nteger M n Z N = {0,...,N 1}. Ths formattng s often done usng the PKCS #1 standard [1, 9]. The cphertext s then computed as C M e mod N. (Other methods for formattng X pror to encrypton are descrbed elsewhere n ths ssue.) Decrypton: To decrypt a cphertext C the decrypter uses ts prvate key d to compute the e th root of C by computng M C d mod N. Snce both d and N are large numbers (each approxmately n bts long) ths s a lengthy computaton for the decrypter. The formattng operaton from the encrypton algorthm s then reversed to obtan the orgnal bt-strng X from M. Note that d must be a large number (on the order of N) snce otherwse the RSA system s nsecure [3, 16]. It s standard practce to employ the Chnese Remander Theorem (CRT) for RSA decrypton. Rather than compute M C d (mod N), one evaluates: M p C dp p (mod p) M q C dq q (mod q) Here d p = d mod p 1 and d q = d mod q 1. Then one uses the CRT to calculate M from M p and M q. Ths s approxmately four tmes as fast as evaluatng C d mod N drectly [10, p. 613]. 2 Batch RSA Fat [8] observed that, when usng small publc exponents e 1 and e 2 for the same modulus N, t s possble to decrypt two cphertexts for approxmately the prce of one. Suppose C 1 s a cphertext obtaned by encryptng some M 1 usng the publc key N, 3, and C 2 s a cphertext for some M 2 usng N, 5. To decrypt, we must compute C 1/3 1 and C 1/5 2 mod N. Fat observed that by settng A = (C1 5 C3 2 )1/15 we obtan: C 1/3 1 = A10 C 3 1 C2 2 and C 1/5 2 = A6 C 2 1 C 2 (1) Hence, at the cost of computng a sngle 15th root and some addtonal arthmetc, we are able to decrypt both C 1 and C 2. Computng a 15th root takes the same tme as a sngle RSA decrypton. Ths batchng technque s only worthwhle when the publc exponents e 1 and e 2 are small (e.g., 3 and 5). Otherwse, the extra arthmetc requred s too expensve. Also, one can only batch-decrypt cphertexts encrypted usng the same modulus and dstnct publc exponents. Ths s essental t s known [13, Appendx A] that one cannot apply such algebrac technques to 2

batch the decrypton of two cphertexts encrypted wth the same publc key (e.g., we cannot batch compute C 1/3 1 and C 1/3 2 ). Fat generalzed the above observaton to the decrypton of a batch of b RSA cphertexts. We have b parwse relatvely prme publc keys e 1,...,e b, all sharng a common modulus N. Furthermore, we have b encrypted messages C 1,...,C b, where C s encrypted usng the exponent e. We wsh to compute M = C 1/e for = 1,...,b. Fat descrbes ths b-batch process usng a bnary tree. For small values of b (b 8), one can use a drect generalzaton of (1). One sets e e, and A 0 Ce/e (where the ndces range over 1,...,b). Then one calculates A A 1/e 0 = b. For each one computes M as: =1 C1/e M = C 1/e = C (α 1)/e A α j Cα /e j j where α = { 1 mod e 0 mod e j (for j ) (2) Ths b-batch requres b modular nversons where as Fat s tree based method requres 2b modular nversons, but fewer auxlary multplcatons. Note that snce b and the e s are small the exponents n Equaton (2) are also small. 2.1 Improvng the performance of batch RSA In [13] the authors show how to use batch RSA wthn the Apache web server to mprove the performance of the SSL handshake. Ths requres changng the web server archtecture. They also descrbe several natural mprovements to batch RSA. We menton a few of these mprovements here. Batch dvson: Modular nverson s much slower than modular multplcaton. Usng a trck due to Montgomery we compute all b nversons n the batch algorthm for the cost of a sngle nverson and a few more multplcatons. The dea s to nvert x and y by computng α (xy) 1 and settng x 1 y α and y 1 x α. Thus we obtan the nverses of both x and y at the cost of a sngle modular nverson and three multplcatons. More generally, we use the followng fact [6, p. 481]: Fact. Let x 1,...,x n be elements of Z N. All n nverses x 1 1,...,x 1 n can be obtaned at the cost of one nverson and 3n multplcatons. Consequently, only a sngle modular nverson s requred for the entre batchng procedure. Global Chnese Remander: In Secton 1.1 we mentoned that RSA decrypton uses the CRT to speed up the computaton of C d mod N. Ths dea extends naturally to batch decrypton. We run the entre batchng algorthm modulo p, and agan modulo q, then use the CRT on each of the b pars C 1/e mod p, C 1/e mod q to obtan the b decryptons M = C 1/e mod N. Smultaneous Multple Exponentaton: Smultaneous multple exponentaton [10, 14.6] s a method for calculatng a u b v mod m wthout frst evaluatng a u and b v. It requres approxmately as many multplcatons as does a sngle exponentaton wth the larger of u or v as exponent. Such products of exponents are a large part of the batchng algorthm. Smultaneous multple exponentaton cuts the tme requred to perform them by close to 30%. 2.2 Performance of batch RSA Table 1 lsts the runnng tme for stand-alone batch-rsa decrypton, usng OpenSSL 0.9.5 on a machne wth a 750 MHz Pentum III and 256 MB RAM, runnng Deban Lnux. In all experments, the smallest possble values for the encrypton exponents e were used. 3

batch key sze sze 768 1024 2048 (unbatched) 4.67 8.38 52.96 2 3.09 5.27 29.43 4 1.93 3.18 16.41 8 1.55 2.42 10.81 Table 1: RSA decrypton tme, n mllseconds, as a functon of batch and key sze Wth standard 1024-bt keys, batchng mproves performance sgnfcantly. Wth b = 4, RSA decrypton s accelerated by a factor of 2.6; wth b = 8, by a factor of almost 3.5. Note that a batch sze of more than eght s probably not useful for common applcatons, snce watng for many decrypton requests to be queued can sgnfcantly ncrease latency. batch Server load sze 16 32 48 (unbatched) 105 98 98 2 149 141 134 4 218 201 187 8 274 248 227 Table 2: SSL handshakes per second as a functon of batch sze. 1024 bt keys. We also menton batch-rsa performance as a component of a larger system a web server handlng SSL traffc. An archtecture for such a system s descrbed n [13]. Table 2 gves the number of full SSL handshakes per second that the batch-rsa web server can handle, when bombarded wth concurrent HTTP HEAD requests by a test clent. Here server load s the number of smultaneous connecton threads the clent makes to the server. Under heavy load, batch RSA can mprove the number of full SSL handshakes per second by a factor of approxmately 2.5. 2.3 The Downsde of Batch RSA Batch RSA can lead to a sgnfcant mprovement n RSA decrypton tme. Nevertheless, there are a few dffcultes wth usng the batchng technque: When usng batch RSA, the decrypton server must mantan at least as many RSA certfcates as there are dstnct keys n a batch. Unfortunately, current Certfcate Authortes charge per certfcate ssued regardless of the publc key n the certfcate. Hence, the cost of certfcates mght outwegh the benefts n performance. For optmal performance, batchng requres RSA publc keys wth very small publc exponents (e = 3, 5, 7, 11,...). Although all known attacks on the resultng system are prevented by approprate paddng, RSA as usually deployed uses a larger publc exponent (e = 65537). 3 Mult-factor RSA The second RSA varant s based on modfyng the structure of the RSA modulus. Here there are two proposals. The frst [7] uses a modulus of the form N = pqr. When N s 1024 bts, each prme s approxmately 341 bts. We refer to ths as mult-prme RSA. The second proposal [14] 4

uses an RSA modulus of the form N = p 2 q and leads to an even greater speedup. Both methods are fully backwards compatble snce the resultng publc-keys are ndstngushable from standard RSA publc keys (where N = pq). 3.1 Mult-prme RSA: N = pqr We begn wth mult-prme RSA [7]. We descrbe key generaton, encrypton, and decrypton. We then dscuss the performance of the scheme and analyze ts securty. Key generaton: The key generaton algorthm takes as nput a securty parameter n and an addtonal parameter b. It generates an RSA publc/prvate key par as follows: Step 1: Generate b dstnct prmes p 1,...,p b each n/b -bts long. Set N b =1 p. For a 1024-bt modulus we can use at most b = 3 (.e., N = pqr), for securty reasons dscussed below. Step 2: Pck the same e used n standard RSA publc keys, namely e = 65537. Then compute d = e 1 mod ϕ(n). As usual, we must ensure that e s relatvely prme to ϕ(n) = b =1 (p 1). The publc key s N, e ; the prvate key s d. Encrypton: Gven a publc key N, e, the encrypter encrypts exactly as n standard RSA. Decrypton: Decrypton s done usng the Chnese Remander Theorem (CRT). Let r = d mod p 1. To decrypt a cphertext C, one frst computes M = C r mod p for each, 1 b. One then combnes the M s usng the CRT to obtan M = C d mod N. The CRT step takes neglgble tme compared to the b exponentatons. Performance. We compare the decrypton work usng the above scheme to the work done when decryptng a normal RSA cphertext. Recall that standard RSA decrypton usng CRT requres two full exponentatons modulo n/2-bt numbers. In mult-prme RSA decrypton requres b full exponentatons modulo n/b bt numbers. Usng basc algorthms computng x d mod p takes tme O(log d log 2 p). When d s on the order of p the runnng tme s O(log 3 p). Therefore, the asymptotc speedup of mult-prme RSA over standard RSA s smply: 2 (n/2) 3 b (n/b) 3 = b2 /4 For 1024-bt RSA, we can use at most b = 3 (.e., N = pqr), whch gves a theoretcal speedup of approxmately 2.25 over standard RSA decrypton. Our experments (mplemented usng the GMP bgnum lbrary) show that n practce we get a speed-up by a factor of 1.73 over standard RSA. Securty. The securty of mult-factor RSA depends on the dffculty of factorng ntegers of the form N = p 1 p b for b > 2. The fastest known factorng algorthm (the number feld seve) cannot take advantage of ths specal structure of N. However, one has to make sure that the prme factors of N do not fall wthn the range of the Ellptc Curve Method (ECM), whch s analyzed n [15]. Currently, 256-bt prme factors are consdered wthn the bounds of ECM, snce the work to fnd such factors s wthn range of the work needed for the RSA-512 factorng project [5]. Consequently, for 1024-bt modul one should not use more than three factors. 5

3.2 Mult-power RSA: N = p 2 q One can further speed up RSA decrypton usng a modulus of the form N = p b 1 q where p and q are n/b bts each [14]. When N s 1024-bts long we can use at most b = 3,.e., N = p 2 q. The two prmes p, q are then each 341 bts long. Key generaton: The key generaton algorthm takes as nput a securty parameter n and an addtonal parameter b. It generates an RSA publc/prvate key par as follows: Step 1: Generate two dstnct n/b -bt prmes, p and q, and compute N p b 1 q. Step 2: Use the same publc exponent e used n standard RSA publc keys, namely e = 65537. Compute d e 1 mod (p 1)(q 1). Step 3: Compute r 1 d mod p 1 and r 2 d mod q 1. The publc key s N, e ; the prvate key s p, q, r 1, r 2. Encrypton: Same as n standard RSA. Decrypton: To decrypt a cphertext C usng the prvate key p, q, r 1, r 2 one does: Step 1: Compute M 1 C r 1 mod p and M 2 C r 2 mod q. Thus M1 e = C mod p and Me 2 = C mod q. Step 2: Usng Hensel lftng [6, p. 137] construct an M 1 such that (M 1 )e = C mod p b 1. Hensel lftng s much faster than a full exponentaton modulo p b 1. Step 3: Usng CRT, compute an M Z N such that M = M 1 mod pb 1 and M = M 2 mod q. Then M = C d mod N s a proper decrypton of C. Performance. We compare the work requred to decrypt usng mult-power RSA to that requred for standard RSA. For mult-power RSA, decrypton takes two full exponentatons modulo (n/b)- bt numbers, and b 2 Hensel lftngs. Snce the Hensel-lftng s much faster than exponentaton, we focus on the tme for the two exponentatons. As noted before, a full exponentaton usng basc modular arthmetc algorthms takes cubc tme n the sze of the modulus. So, the speedup of mult-power RSA over standard RSA s approxmately: 2 (n/2) 3 2 (n/b) 3 = b3 /8 For 1024-bt RSA, b should agan be at most three (.e., N = p 2 q), gvng a theoretcal speedup of about 3.38 over standard RSA decrypton. Our experments (mplemented usng GMP and takng e = 65537) show that n practce we get a speed-up by a factor of 2.30 over standard RSA. Securty. The securty of mult-power RSA depends on the dffculty of factorng ntegers of the form N = p b 1 q. As for mult-prme RSA, one has to make sure that the prme factors of N do not fall wthn the capabltes of ECM (and the ECM mprovement for N = p 2 q [11]). Consequently, for 1024-bt modul one can use at most b = 3,.e., N = p 2 q. In addton, we note that the Lattce Factorng Method (LFM) [4], desgned to factor ntegers of the form N = p u q for large u, cannot effcently factor ntegers of the form N = p 2 q when N s 1024 bts long. 6

4 Rebalanced RSA In standard RSA, encrypton and sgnature verfcaton are much faster than decrypton and sgnature generaton. In some applcatons, one would lke to have the reverse behavor. For example, when a cell phone needs to generate an RSA sgnature that wll be later verfed on a fast server one would lke sgnng to be easer than verfyng. Smlarly, SSL web browsers (dong RSA encrypton) typcally have dle cycles to burn whereas SSL web servers (dong RSA decrypton) are overloaded. In ths secton we descrbe a varant of RSA that enables us to rebalance the dffculty of encrypton and decrypton: we speed up RSA decrypton by shftng the work to the encrypter. Ths varant s based on a proposal by Wener [16] (see also [2]). Note that we cannot smply speedup RSA decrypton by usng a small value of d snce as soon as d s less than N 0.292 RSA s nsecure [16, 3]. The trck s to choose d such that d s large (on the order of N), but d mod p 1 and d mod q 1 are small numbers. As before, we descrbe key generaton, encrypton, and decrypton. Key generaton: The key generaton algorthm takes two securty parameters n and k where k n/2. Typcally n = 1024 and k = 160. It generates an RSA key as follows: Step 1: Generate two dstnct (n/2)-bt prmes p and q wth gcd(p 1, q 1) = 2. Compute N pq. Step 2: Pck two random k-bt values r 1 and r 2 such that gcd(r 1, p 1) = 1, and gcd(r 2, q 1) = 1, and r 1 = r 2 mod 2 Step 3: Fnd a d such that d = r 1 mod p 1 and d = r 2 mod q 1. Step 4: Compute e d 1 mod ϕ(n). The publc key s N, e ; the prvate key s p, q, r 1, r 2. We need to explan how to fnd d n Step 3. One usually uses the Chnese Remander Theorem (CRT). Unfortunately, p 1 and q 1 are not relatvely prme (they are both even) and consequently the theorem does not apply. However, (p 1)/2 s relatvely prme to (q 1)/2. Furthermore, r 1 = r 2 mod 2. Let a = r 1 mod 2. Then usng CRT we can fnd an element d such that d = r 1 a 2 (mod p 1 2 ) and d = r 2 a 2 (mod q 1 2 ) Now, observe that the requred d n Step 3 s smply d = 2d + a. Indeed, d = r 1 mod p 1 and d = r 2 mod q 1. In Step 4, we must justfy why d s nvertble modulo ϕ(n). Recall that gcd(r 1, p 1) = 1 and gcd(r 2, q 1) = 1. It follows that gcd(d, p 1) = 1 and gcd(d, q 1) = 1. Consequently gcd(d, (p 1)(q 1)) = 1. Hence, d s nvertble modulo ϕ(n) = (p 1)(q 1). For securty reasons descrbed below we take k = 160, although other larger values are acceptable. Note that e s very large on the order of N. Ths s unlke standard RSA, where e typcally equals 65537. All Certfcate Authortes we tested were wllng to generate certfcates for such RSA publc keys. Encrypton: Encrypton usng the publc key N, e s dentcal to encrypton n standard RSA. The only ssue s that snce e s much larger than n standard RSA, the encrypter must be wllng to accept such publc keys. At the tme of ths wrtng all browsers we tested were wllng to accept such keys. The only excepton s Mcrosoft s Internet Explorer (IE) IE allows a maxmum of 32 bts for e. 7

Decrypton: To decrypt a cphertext C usng the prvate key p, q, r 1, r 2 one does: Step 1: Compute M 1 C r 1 mod p and M 2 C r 2 mod q. Step 2: Usng the CRT compute an M Z N such that M = M 1 mod p and M = M 2 mod q. Note that M = C d mod N. Hence, the resultng M s a proper decrypton of C. Performance. We compare the work requred to decrypt usng the above scheme to that requred usng standard RSA. Recall that decrypton tme for standard RSA wth CRT s domnated by two full exponentatons modulo (n/2)-bt numbers. In the scheme presented above, the bulk of the decrypton work s n the two exponentatons n Step 1, but n each of these the exponent s only k bts long. Snce modular exponentaton takes tme lnear n the exponent s bt-length, we get a speedup of (n/2)/k over standard RSA. For a 1024-bt modulus and 160-bt exponent (k = 160), ths gves a theoretcal speedup of about 3.20 over standard RSA decrypton. Our experments (mplemented usng GMP) show that n practce we get a speed-up by a factor of 3.06 over standard RSA. Securty. It s an open research problem whether RSA usng values of d as above s secure. Snce d s large, the usual small-d attacks [16, 3] do not apply. The best known attack on ths scheme s descrbed n the followng lemma [2]. Lemma. Let N, e be an RSA publc key wth N = pq. Let d Z be the correspondng RSA prvate exponent satsfyng d = r 1 mod p 1 and d = r 2 mod q 1 wth r 1 < r 2. Then gven N, e an adversary can expose the prvate key d n tme O( r 1 log r 1 ). The above attack shows that, to obtan securty of 2 80, both r 1 and r 2 must be at least 160 bts long. Consequently, for securty reasons k should not be less than 160. 5 Conclusons We surveyed four varants of RSA desgned to speed up RSA decrypton and be backwardscompatble wth standard RSA. Table 3 gves the decrypton speedup factors for each of these varants usng a 1024-bt RSA modulus. Batch RSA s fully backwards-compatble, but requres the decrypter to obtan and manage multple publc keys and certfcates. The two mult-factor RSA technques are promsng n that they are fully backwards compatble. The rebalanced RSA method gves a large speedup, but only works wth peer applcatons that properly mplement standard RSA, and so are wllng to accept RSA certfcates wth a large encrypton-exponent e. Currently, Internet Explorer rejects all RSA certfcates where e s more than 32 bts long. Multfactor RSA and rebalanced RSA can be combned to gve an addtonal speedup. All these varants can take advantage of advances n algorthms for modular arthmetc (e.g., modular multplcaton and exponentaton) on whch RSA s bult. Method Speedup Comment Batch RSA, b = 4 2.64 Requres multple certfcates Mult-prme, N = pqr 1.73 Mult-power, N = p 2 q 2.30 e = 65537 Rebalanced, k = 160 3.06 Incompatble wth Internet Explorer Table 3: Comparson of RSA varants. Expermental speedup factors for 1024-bt keys. 8

Acknowledgments The authors thank Ar Juels for hs comments on prelmnary versons of ths paper. References [1] M. Bellare and P. Rogaway. Optmal Asymmetrc Encrypton. In A. De Sants, ed, Proceedngs of Eurocrypt 94, vol. 950 of Lecture Notes n Computer Scence (LNCS), pp. 92 111. Sprnger-Verlag, 1994. [2] D. Boneh. Twenty Years of Attacks on the RSA Cryptosystem. Notces of the Amercan Mathematcal Socety, 46(2):203 213, Feb. 1999. [3] D. Boneh and G. Durfee. Cryptanalyss of RSA wth Prvate Key d Less than n 0.292. IEEE Transactons on Informaton Theory, 46(4):1339 1349, Jul. 2000. Early verson n Proceedngs of Eurocrypt 99. [4] D. Boneh, G. Durfee, and N. Howgrave-Graham. Factorng N = p r q for Large r. Proceedngs of Crypto 99, vol. 1666 of LNCS, pp. 326 337. Sprnger-Verlag, 1999. [5] S. Cavallar, B. Dodson, A. K. Lenstra, W. Loen, P. Montgomery, B. Murphy, H. Rele, K. Aardal, J. Glchrst, G. Gullerm, P. Leyland, J. Marchand, F. Moran, A. Muffett, C. Putnam, P. Zmmermann, Factorzaton of a 512-Bt RSA Modulus, Proceedngs of Eurocrypt 2000, vol. 1807 of Lecture Notes n Computer Scence (LNCS), pp. 1 11, Sprnger-Verlag, 2000. [6] H. Cohen. A Course n Computatonal Algebrac Number Theory, vol 138 of Graduate Texts n Mathematcs. Sprnger-Verlag, 1996 [7] T. Collns, D. Hopkns, S. Langford, and M. Sabn. Publc Key Cryptographc Apparatus and Method. US Patent #5,848,159. Jan. 1997. [8] A. Fat. Batch RSA. In G. Brassard, ed., Proceedngs of Crypto 89, vol. 435 of LNCS, pp. 175 185. Sprnger-Verlag, 1989. [9] RSA Labs. Publc Key Cryptography Standards (PKCS), Number 1 Verson 2.0. Verson 2.1 draft s avalable at http://www.rsalabs.com/pkcs/pkcs-1/ndex.html [10] A. Menezes, P. Van Oorschot, and S. Vanstone. Handbook of Appled Cryptography. CRC Press, 1997. [11] E.Okamoto, R.Peralta. Faster Factorng of Integers of a Specal Form, IEICE Transactons on Fundamentals of Electroncs, Communcatons, and Computer Scences, E79-A, n.4 (1996). [12] R. Rvest, A. Shamr, and L. Adleman. A Method for Obtanng Dgtal Sgnatures and Publc Key Cryptosystems. Commun. of the ACM, 21(2):120 126. Feb. 1978. [13] H. Shacham and D. Boneh. Improvng SSL Handshake Performance va Batchng. In D. Naccache, ed., Proceedngs of RSA 2001, vol. 2020 of LNCS, pp. 28 43. Sprnger-Verlag, 2001. [14] T. Takag. Fast RSA-type Cryptosystem Modulo p k q. In H. Krawczyk, ed., Proceedngs of Crypto 98, vol. 1462 of LNCS, pp. 318 326. Sprnger-Verlag, 1998. 9

[15] R. Slverman and S. Wagstaff Jr. A Practcal Analyss of the Ellptc Curve Factorng Algorthm. Math. Comp. 61(203):445 462. Jul. 1993. [16] M. Wener. Cryptanalyss of Short RSA Secret Exponents. IEEE Trans. on Info. Th. 36(3):553 558. May 1990. 10