A Scalable ECC Processor for High-Speed and Light-Weight Implementations with Side-Channel Countermeasures

Size: px
Start display at page:

Download "A Scalable ECC Processor for High-Speed and Light-Weight Implementations with Side-Channel Countermeasures"

Transcription

1 A Scalable ECC Processor for High-Speed and Light-Weight s ith Side-Channel Countermeasures Ahmad Salman, Ahmed Ferozpuri, Ekaat Homsirikamol, Panasayya Yalla, Jens-Peter Kaps, and Kris Gaj Cryptographic Engineering Research Group (CERG) Department of ECE, Volgenau School of Engineering, George Mason University, Fairfax, VA, USA Department of Integrated Science and Technology, James Madison University, Harrisonburg, VA, USA ReConFig-27 ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight / 52

2 Outline Introduction Introduction ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 2 / 52

3 Introduction NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Advantages of ECC: Short key lengths, ciphertexts and signatures smaller storage Fast key generation Fast digital signatures High-Speed: lo latency and latency delay Lighteight: Moderate latency delay ith minimal resource usage (area as ell as poer). Min. of Symm. RSA ECC Strength Alg len(p) 8 2TDEA, TDEA 2, AES-28 3, AES-92 7, AES-256 5,36 52 [] NIST SP 8-57, Pt., Rev.4, Recommendation for Key Management: General, Jan 26. [2] NIST FIPS PUB 86-4, Digital Signature Standard (DSS), Jul 23. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 3 / 52

4 Elliptic Curves Introduction NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Elliptic curves built over m K = GF(2 ) Normal basis representation K = GF(p) Polynomial basis representation Elliptic Curve over GF (p) is the set of all pairs (x, y) GF (p) hich fulfill y 2 x 3 + a x + b mod p here a, b GF (p) and 4 a b 2 mod p + a special point called the point at infinity O K=GF (p): Arithmetic operations present in many libraries K=GF (2 m ): Fast in hardare Compact in hardare ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 4 / 52

5 NIST Curves Introduction NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication GF (p) denotes a prime field ith p elements here p is prime. GF (2 m ) denotes a binary field ith 2 m elements for some m (called degree of the field) Recent improvements in attacking discrete logarithms over small-characteristic fields raised security concerns about binary curves (applies only to pairings for the time being). NIST special curves are those hose coefficients and underlying field have been selected to optimize the efficiency of the elliptic curve operations. NIST special primes are of a special type (called generalized Mersenne numbers) for hich modular multiplication can be carried out more efficiently than in general. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 5 / 52

6 ECC Operations Introduction NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Cryptographic Protocols ECC Operations Scalar Multiplication Point Addition Point Doubling Affine to Projective & Projective to Affine GF Operations Inversion Division Addition Multiplication Squaring Interchangeable ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 6 / 52

7 Scalar multiplication Introduction NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Fast and efficient in calculating the scalar multiplication. Does not require a lot of memory. Subject to side-channel attacks. Simple Poer Analysis (SPA) Differential Poer Analysis (DPA) Double-and-Add k P Require: Prime p E(F q ), P = (x, y), here x, y GF (p), k Z, < k < #E, k = (k l, k l 2,..., k ) 2, and k l = Ensure: Q = (x, y ) Q = P for i = l 2 donto do Q = 2Q if k i = then Q = Q + P return Q ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 7 / 52

8 Side Channel Analysis (SCA) NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Plain Text Cryptographic Device Secret Key Cipher Text ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 8 / 52

9 Side Channel Analysis (SCA) NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication EM Radiation Side Channels Timing Poer Consumption Plain Text Cryptographic Device Secret Key Cipher Text ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 8 / 52

10 Side Channel Analysis (SCA) NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Danger: s are susceptible to Side Channel Analysis (SCA). Key space 256-bit, =.2 77 keys EM Radiation Side Channels Timing Poer Consumption Atoms in Universe (ikipedia) SCA allos to attack 8-bit at a time SCA complexity = 892 Plain Text Cryptographic Device Secret Key Cipher Text ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 8 / 52

11 Side Channel Analysis (SCA) NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Danger: s are susceptible to Side Channel Analysis (SCA). Key space 256-bit, =.2 77 keys EM Radiation Side Channels Timing Poer Consumption Atoms in Universe (ikipedia) SCA allos to attack 8-bit at a time SCA complexity = 892 Plain Text Cryptographic Device Secret Key Cipher Text Danger These are passive, non invasive attacks. They are difficult to detect. The measurement setup is not very expensive ($2 $2,). Applies to Softare as ell as Hardare implementations. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 8 / 52

12 Simple Poer Analysis (SPA) NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Involves direct interpretation of poer traces.(one or More Traces) Hamming eight of certain byte of the key, Number of rounds in the Algorithm(resp to Key Length) Memory accesses have higher poer consumption, repetitive patterns Cons: Attacker needs to have detailed information about the algorithm. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 9 / 52

13 Differential Poer Analysis (DPA) NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Kocher et al. proposed poer analysis attack in 999. Vcc DPA exploits the data dependency beteen poer consumption of cryptographic device and secret. Detailed knoledge about the attacked device is not required. A GND A C Load DPA in a Nutshell Choose intermediate result, should be function of data & key. 2 Measure the poer consumption. 3 Calculate the hypothetical values and build a poer model. 4 Compare hypothetical poer model ith poer consumption. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight / 52

14 Differential Poer Analysis (DPA) Kocher et al. proposed poer analysis attack in 999. DPA exploits the data dependency beteen poer consumption of cryptographic device and secret. Detailed knoledge about the attacked device is not required. DPA in a Nutshell NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication A Vcc GND A C Load Choose intermediate result, should be function of data & key. 2 Measure the poer consumption. 3 Calculate the hypothetical values and build a poer model. 4 Compare hypothetical poer model ith poer consumption. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight / 52

15 Differential Poer Analysis (DPA) Kocher et al. proposed poer analysis attack in 999. DPA exploits the data dependency beteen poer consumption of cryptographic device and secret. Detailed knoledge about the attacked device is not required. DPA in a Nutshell NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication A Vcc GND A C Load Choose intermediate result, should be function of data & key. 2 Measure the poer consumption. 3 Calculate the hypothetical values and build a poer model. 4 Compare hypothetical poer model ith poer consumption. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight / 52

16 Safe Scalar multiplication NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Montgomery Ladder protects against SPA and safe-error attacks. Requires more computations to calculate the scalar multiplication. Coron s Randomization of the Private Exponent method protects against DPA. Calculate Q = k P instead of Q = kp here k = k + d #E. Montgomery Ladder k P Require: Prime p E(F q ), P = (x, y), here x, y GF (p) k Z, < k < #E, k = (k l, k l 2,..., k ) 2, k l = Ensure: Q = (x, y ) Q = P P = 2Q for i = l 2 donto do if k i = then Q = Q + P P = 2P else P = P + Q Q = 2Q return Q ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight / 52

17 Projective coordinates NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Point Addition Point Doubling Coordinates #Muls #Adds #Invs #Muls #Adds #Invs A + A =A P + A =A 3 7 N/A P + P =P MJ+MJ =MJ A Affine; P Projective; MJ Modified Jacobian Affine: Requires time consuming inverse operation Projective: Only one inversion at the end of a full scalar multiplication Modified Jacobian: Proposed by Cohen et al. Quadruple representation of a point (X, Y, Z, az 4 ) Fast point doubling Easy conversion beteen Affine and Modified Jacobian P A = (x, y) P MJ = (x, y,, a) P MJ = (X, Y, Z, az 4 ) P A = (X /Z 2, Y /Z 3 ) ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 2 / 52

18 Co-Z Scalar multiplication NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication Point addition can be accelerated if both points P and Q share the same Z coordinate. Montgomery Ladder can be efficiently calculated using three algorithms Add ith update (ZADDU) Add ith conjugate (ZADDC) Double ith update (DBLU) Montgomery Ladder k P Require: Prime p E(F q ), P = (X, Y, Z), here X, Y, Z GF (p), k Z, < k < #E, k = (k l, k l 2,..., k ) 2, k l = Ensure: Q = (X, Y, Z) P, Q = DBLU(P) for i = l 2 donto do if k i = then Q, P = ZADDC(P, Q) P, Q = ZADDU(Q, P) else P, Q = ZADDC(Q, P) Q, P = ZADDU(P, Q) return Q Almost as fast as Double-and-ADD algorithm. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 3 / 52

19 Montgomery Multiplication NIST Curves ECC Operations Scalar Multiplication Montgomery Multiplication X, Y, M are n-bit numbers, R = 2 n, Z = X Y mod M Ordinary Montgomery domain domain X X = X R mod M Y Y = Y R mod M Z Z = X Y R mod M Z R mod M Z X Y Z = Mont(X, Y, M) = X Y R mod M = (X R) (Y R) R mod M = X Y R mod M = Z R mod M X X X = Mont(X, R 2 mod M, M) = X R 2 R mod M = X R mod M Z Z Z = Mont(Z,, M) = (Z R) R mod M = Z mod M = Z ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 4 / 52

20 Montgomery Multiplication ECC Scalar Multiplication Montgomery Multiplication Architectures Tenca and Koc introduced a ord-based algorithm for Montgomery multiplication, called Multiple-Word Radix-2 Montgomery Multiplication (MWR2MM), as ell as a scalable hardare architecture capable of performing the multiplication operation using a variable number of Processing elements (PEs). (999) The systolic high-radix design by McIvor et al. is capable of very high speed operation ith the penalty of using large area requirements for fast multiplier units. (24) Kaihara et al. proposed a concept hich enables parallel execution of the Montgomery and Interleaved multiplication. (25) Öksüzoğlu et al. reported DSP-based architecture for lo-cost devices. (28) ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 5 / 52

21 Montgomery Multiplication ECC Scalar Multiplication Montgomery Multiplication s Harris et al. implemented the MWR2MM algorithm by left shifting one of the operands (Y) and the modulus (M) instead of right shifting the intermediate result (S). Their approach led to an improvement in terms of latency and latency area by factor of to. (2) Suzuki combined MWR2MM ith the quotient pipelining technique and proposed an architecture hich can be mapped efficiently onto modern high-performance DSP-oriented FPGA structure. (27) Huang et al. proposed to architectures to optimize the original MWR2MM algorithm to process n-bit precision multiplication in approximately n clock cycles by precomputing intermediate S values. (2) ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 6 / 52

22 ECC Scalar Multiplier Architectures Montgomery Multiplication ECC Scalar Multiplication Örs et al. introduced a module-based design for ECC processors over GF (p). The architecture is suitable for any prime field and any prime. The design uses Montgomery in a systolic array architecture to perform modular multiplication. (23) Amiet et al. implemented a generalized ECC processor hich supports all five of NIST prime fields for any prime p. Their implementation uses to Montgomery multiplier units hich ork in parallel ith a Modular adder/subtractor to perform the prescheduled scalar multiplication operations. (26) MuthuKumar and Jeevananthan proposed a high-speed ECC scalar multiplier over GF (p) and GF (2 m ) for key-size of 256-bits. They use Jacobian coordinates and Montgomery multipliers built of 6 x 6 multiplication units. (2) ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 7 / 52

23 Montgomery Multiplication ECC Scalar Multiplication ECC Scalar Multiplier s Q. Xu et al. designed a lo area ECC multiplier that supports NIST p-6, p-92, and p-256 curves. They proposed a tiny hardare module targeting ASICs. The design has counter measures to side-channel attacks (SPA), hile having average performance. (28) Alrimeih et al. implemented a hardare/softare co-design for ECC processor to perform the scalar multiplication over GF (p). Supports all five prime fields recommended by NIST but also limited to and optimized for their corresponding primes. (24) Sasdrich and Güneysu implemented a hardare accelerator for ECC point multiplication. The design is limited to Curve 2559 using pseudo Mersenne primes. Their ork as expanded later to include techniques for counter measures against SPA and DPA attacks. (25) ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 8 / 52

24 Design Decisions Introduction Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Generalized for all GF (p) curves for a specified field size. External Memory usage Support for ASIC implementations, mapps to embedded memories on FPGAs. Unified high-speed and lighteight storage requirements. Support for all 5 NIST field sizes for a ide range of applications: 92, 224, 256, 384, and 52. Not limited to special primes Optimizations for special primes might be patent restricted. Generic design for FPGA and ASIC, not targeted for special FPGA features: e.g. DSP. High-speed design uses different ord sizes (6, 32, and 64) and redundant representation to achieve high throughput. Lighteight design uses a variable number of PE units (2, 4, or 8) to increase flexibility hile maintaining lo area. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 9 / 52

25 Top Level Architecture Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier FIFO interface Independent initialization of field and curve parameters. Interface ith external memory for ASIC implementations Modular Montgomery Multiplication (MMM) Modular Addition and Subtraction (MAS) clk rst di_data di_valid di_ready W Scheduler W do_data do_valid do_ready MMM MAS start init_field init_curve Mem_Ctrl ECC busy RAM RAM2 ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 2 / 52

26 Scheduler Introduction Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Start Normal to Montgomery Conversion Scalar Multiplication Projective to Affine Conversion Montgomery to Normal Conversion Outputs Each state has its on controller making the design modular. start signal triggers a state to begin operation and hands control back to the scheduler by returning a done signal. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 2 / 52

27 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Scheduler: EC Point Addition (unprotected) Algorithm 3(a)[3] Control ROM: Q = P + Q Require: P = (x, y,, a), P = (xr,yr,,a), Q = (X q,y q,z q,az qˆ4) P 2 = (X 2, Y 2, Z 2, az2 4 ) Multiplier Adder Ensure: P + P 2 = P 3 = (X 3, Y 3, Z 3, az3 4 Ops ) Res OP OP2 Res OP OP2 : T Z2 2 T Z q Z q mul 2: T 2 xt T 2 xr T mul 3: T T Z 2 T 3 X 2 T 2 T T Z q T 3 X q T 2 mulsub 4: T yt T yr T mul 5: T 4 T3 2 T 5 Y 2 T T 4 T 3 T 3 T 5 Y q T mulsub 6: T 2 T 2 T 4 T 2 T 2 T 4 mul 7: T 4 T 4 T 3 T 6 2T 2 T 4 T 4 T 3 T 6 T 2 T 2 muladd 8: Z 3 Z 2 T 3 T 6 T 4 + T 6 Z q Z q T 3 T 6 T 4 T 6 muladd 9: T 3 T5 2 T 3 T 5 T 5 mul : T T T 4 X 3 T 3 T 6 T T T 4 X q T 3 T 6 mulsub : az3 4 Z 3 2 T 2 T 2 X 3 az qˆ4 Z q Z q T 2 T 2 X q mulsub 2: T 3 T 5 T 2 T3 T 5 T 2 mul 3: az3 4 (az 3 4)2 Y 3 T 3 T az qˆ4 az qˆ4 az qˆ4 Y q T 3 T mulsub 4: az3 4 a(az 3 4 ) az qˆ4 ar az qˆ4 mul [3] S.B. Örs, L. Batina, B. Preneel, and J. Vandealle, Hardare of an Elliptic Curve Processor over GF (p), in ASAP 23, IEEE, Jun 23. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 22 / 52

28 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Scheduler: EC Point Doubling (unprotected) Algorithm 3(b)[3] Control ROM: Q = 2Q Q = (X q,y q,z q,az qˆ4) Require: P = (X, Y, Z, az 4 ), Multiplier Adder Ensure: 2P = P 3 = (X 3, Y 3, Z 3, az4 4 ) Res OP OP2 Res OP OP2 Ops : T Y 2 T 2 2X T Y q Y q T 2 X q X q muladd 2: T 3 T 2 T 2 2T 2 T 3 T T T 2 T 2 T 2 muladd 3: T T 2 T T 3 2T 3 T T 2 T T 3 T 3 T 3 muladd 4: T 2 X 2 T 3 2T 3 T 2 X q X q T 3 T 3 T 3 muladd 5: T 4 Y Z T 3 2T 3 T 4 Y q Z q T 3 T 3 T 3 muladd 6: T 5 T 3 (az 4) T 6 2T 2 T 5 T 3 az qˆ4 T 6 T 2 T 2 muladd 7: T 2 T 6 + T 2 T 2 T 6 T 2 add 8: T 2 T 2 + (az 4 ) T 2 T 2 az qˆ4 add 9: T 6 T2 2 Z 3 2T 4 T 6 T 2 T 2 Z q T 4 T 4 muladd : T 4 2T T 4 T T add : X 3 T 6 T 4 X q T 6 T 4 sub 2: T T X 3 T T X q sub 3: T 2 T 2 T az3 4 2T 5 T 2 T 2 T az qˆ4 T 5 T 5 muladd 4: Y 3 T 2 T 3 Y q T 2 T 3 sub [3] S.B. Örs, L. Batina, B. Preneel, and J. Vandealle, Hardare of an Elliptic Curve Processor over GF (p), in ASAP 23, IEEE, Jun 23. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 23 / 52

29 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Scheduler: EC Point ZADDC (protected) Algorithm 3[4] Control ROM: Q, P = P + Q Require: P = (X, Y, Z), P = (X p,y p,z), Q = (X q,y q,z) P 2 = (X 2, Y 2, Z) Multiplier Adder Ensure: P + P 2 = P 3 = (X 3, Y 3, Z 3 ) P Res OP OP2 Res OP OP2 Ops : T X X 2 T X p X q sub 2 : T 2 T 2 T 3 Y Y 2 T 2 T T T 3 Y p Y q mulsub 3 : Z 3 ZT T 4 Y + Y 2 Z Z T T 4 Y p Y q muladd 4 : T X T 2 T X p T 2 mul 5 : X 2 X 2 T 2 X q X q T 2 mul 6 : Y 2 T4 2 T 5 T X 2 Y q T 4 T 4 T 5 T X q mulsub 7 : T 2 T3 2 Y 2 Y 2 T T 2 T 3 T 3 Y q Y q T mulsub 8 : T 5 Y T 5 T 2 T 2 T T 5 Y p T 5 T 2 T 2 T mulsub 9 : X 3 Y 2 X 2 X p Y q X q sub : X 2 T 2 X 2 X q T 2 X q sub : Y 2 T X 2 Y q T X q sub 2: T 2 T 3 Y 2 T T X 3 T 2 T 3 Y q T T X p mulsub 3: T 3 T 4 T Y 2 T 2 T 5 T 3 T 4 T Y q T 2 T 5 mulsub 4: Y 3 T 3 T 5 Y p T 3 T 5 sub [4] R. R. Goundar, M. Joye, and A. Miyaji, Co-z addition formula and binary ladders on elliptic curves, CHES. 2, pp ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 24 / 52

30 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Scheduler: EC Point ZADDU (protected) Algorithm 2[4] Control ROM: P, Q = Q + P Require: P = (X, Y, Z), P = (X p,y p,z), Q = (X q,y q,z) P 2 = (X 2, Y 2, Z) Multiplier Adder Ensure: P 2 + P = P 3 = (X 3, Y 3, Z 3 ) P 2 Res OP OP2 Res OP OP2 Ops : T X X 2 T X p X q sub 2 : T 2 T 2 T 3 Y Y 2 T 2 T T T 3 Y p Y q mulsub 3 : Z 3 ZT Z Z T mul 4 : T X 2 T 2 T X q T 2 mul 5 : X X 2 T 2 X p X p T 2 mul 6 : T 4 T3 2 T 5 X T T 4 T 3 T 3 T 5 X p T mulsub 7 : Y Y T 5 T 4 T 4 X Y p Y p T 5 T 4 T 4 X p mulsub 8 : X 3 T 4 T X q T 4 T sub 9 : Y 2 X X 3 Y q X p X q sub : Y 2 T 3 Y 2 Y q T 3 Y q mul : Y 3 Y 2 Y Y q Y q Y p sub [4] R. R. Goundar, M. Joye, and A. Miyaji, Co-z addition formula and binary ladders on elliptic curves, CHES. 2, pp ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 25 / 52

31 Memory Introduction Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Nr RAM idx Name Rˆ2 mod M a 2 2 x 3 3 y 4 4 R 5 5 X q 6 6 Y q 7 7 Z q 8 8 az qˆ4 9 9 T T 2 T T T T MSB52(-4) 6 M 7 M K 9 3 MSB52(6-8) We need to store 8 operands, incl. temporary values, each of maximum size 52 bits. Memory : 6 x 52 bits, Memory 2: 4 x 52 bits. 9 MSB bits of 52-bit operands are stored in MSB52 locations and packed if = 64. Example: Virtual Address X_q RAM idx MSB MSB X_q Physical Address X_q MSB Physical Address ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 26 / 52

32 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

33 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

34 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

35 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

36 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod carry ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

37 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod carry 5 ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

38 Modular Adder Subtracter (MAS) Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF first B>A Subtraction mod carry 5 Positive result Done. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52 C

39 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

40 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

41 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

42 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

43 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 Negative result addition of modulus. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

44 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 Negative result addition of modulus. Addition mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

45 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 Negative result addition of modulus. Addition mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

46 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 Negative result addition of modulus. Addition mod ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

47 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 Negative result addition of modulus. Addition mod carry ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

48 Modular Adder Subtracter (MAS) A sign add/subtract last B All supported field sizes except 52 are divisible by 6 and 32. Storing them in ord-size memory does not leave space for sign. FF Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier first B>A Subtraction mod carry 5 Positive result Done. C Subtraction mod carry -8 Negative result addition of modulus. Addition mod carry 6 ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 27 / 52

49 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Modular Montgomery Multiplier (MMM) Optimized MWR2MM [4] : if j = then 2: q i = (x i Y () ) S () 3: C () = 4: if j < e then 5: (CO (j+), SO (j), S(j) 2... ) = (, S(j)... ) + C (j) + x i Y (j) + q i M (j) 6: (CE (j+), SE (j), S(j) 2... ) = (, S(j)... ) + C (j) + x i Y (j) + q i M (j) 7: if S (j+) = then 8: C (j+) = CO (j+) 9: S (j)... = (SO(j), S(j) 2... ) : else : C (j+) = CE (j+) 2: S (j) (j)... = (SE, S(j) 2... ) 3: else 4: (C (e), S (e ) ) = (C (e), S (e )... ) + C (e ) + x i Y (e ) + q i M (e ) Task E Task F Task D [4] M. Huang, K. Gaj, and T. El-Ghazai. Ne hardare architectures for Montgomery modular multiplication algorithm, in IEEE ToCo, 6(7), pp , Jul, 2. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 28 / 52

50 t Introduction Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier High-Speed Design (based on architecture 2 [4]) x x x 7 x 8 x 52 =64 n=52 e=9 p=9 PE# Y () S () PE# Y () S () S () S () S () S () S () S (7) S (8) S () S () S (7) S (8) S () S (7) S (8) Task D Task E Task F PE#7 Y (7) S (7) PE#8 Y (8) S (7) S (8) S (8) e clk n clk n number of ords e = number of processing elements p = e number of clock cycles T = n + (e ) Example: = 64, n = 52 e = = 9 T = 52 + (9 ) = 529 Example: = 32, n = 52 e = = 7 T = 52 + (7 ) = 537 /2 T minimal bigger 2e 2p but each PE processes bits/2 area similar ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 29 / 52

51 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier High-Speed Design: supporting different field sizes q i q i Y () Y () M () M () () (2) S S D E PE# C () PE# C (2) q i (e r 2) (e r 2) Y (e 2) M r (e ) S r E (e ) PE# C r e r 2 (e ) bit SREG q q i (e r ) (e r ) Y E/F PE# e r (e ) r M S (e ) r (e ) r C q q q i e r i (e 5 2) i (e 5 ) (e r ) (e 5 2) (e ) Y Y Y 5 (e r ) (e M 5 2) (e ) M M 5 (e +) S r (e S 5 ) E E F (e +) (e ) PE# C r PE# C 5 PE# e r e 5 2 e 5 X i S () X i S () X (e 2) S r i (e 2) r X (e ) S r E/F_sel i (e ) r (e ) bit SREG X X i (e ) r S (e r ) X (e 2) S 5 i (e 2) 5 X (e S 5 ) i (e ) 5 = 6, 32 or 64 e = 92 e = e = e = e = ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 3 / 52

52 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Lighteight Design (based on architecture [4]) Each PE can perform E, D, and F p < e e must store inbeteen values in a queue. size of queue Q = e p n T = n + (e p) + p + p Example: p = 8, = 6, n = 52 e = 33, T = 28, Q = 25 Example: p = 4, = 6, n = 52 e = 33, T = 4325, Q = 29 p/2 > 2T Q increases a bit area/2 for PE t Y () Y () Y (2) Y () Y () Y (2) Y () Y () Y (2) S () S () S (2) S () S () S (2) S () S () S (2) x S () S () S (2) x S () 2 S () S (2) x S () 4 S () S (2) x S () x 3 S () x 5 Q# S () S (2) S () S (2) Task D Task E Task F ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 3 / 52 PE# PE# =2 n=6 e=3 p=2 S () S () S (2)

53 Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier Reading and Reformatting x i en x p en x p+ X (j) x 2p x 2p+ x 2p+2 load x p x en x p+ x en en x p+2 x p+2 x 2 x i x i+ en x i+2 en x x 3p x 2p x p en x i+p sreg E A B C D 92 A= p 224 B= p 256 C= p 384 D= p 52 E= p ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 32 / 52

54 Main Computational Unit Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier M (j) Y (j) reg reg reg sreg E A B C D same as previous diagram 8 x i x i+ x i+2 x i+p F D sreg 9 reg F D sreg 8 reg F D PE# PE# PE#2 8 sreg 2 reg F D PE# p 8 sreg 2 sreg9 p S (j) 5 8 ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 33 / 52

55 Inside the PE Unit Introduction Design Decisions Scheduler Modular Adder Subtracter Modular Montgomery Multiplier X i M (j) Y (j) S (j) +2 mode F (e) C + S (j) 2 2 reg FF en S (j+) 2 S (j) + S (j) 2 S (j) q i C mode D ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 34 / 52

56 Test Setup Introduction Results Conclusions Embedded memories are used only for external RAM. All implementations are coded in VHDL and do not use any other embedded resources. Implemented using Xilinx ISE 4.7, Quartus Prime 6., and Libero SoC.7 and Optimized using ATHENa. All results are Post-Place & Route. Xilinx Altera Microsemi Family Tech Family Tech Family Tech Spartan-6 45 nm Virtex-6 4 nm Stratix IV 4 nm IGLOO2 65 nm Artix-7 28 nm Cyclone V 28 nm Virtex-7 28 nm Stratix V 28 nm Zynq-7 28 nm ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 35 / 52

57 Results Conclusions Latency & Throughput for a given field and idth for HS Latency in clock cycles TP in Op/sec Field size Size of ord (W) at f = MHz W=64 W=32 W=6 W=64 W=32 W=6 92-bit 766, ,229,7, bit,45,978,64,77,43, bit,296,79,462,77,793, bit 2,9,58 3,263,447 3,969, bit 5,398,49 6,7,54 7,255, Average 2,283,646 2,555,26 3,4, bit 824,22 939,969,7, bit,25,24,26,229,564, bit,395,369,584,853,963, bit 3,23,729 3,528,699 4,338, bit 5,79,34 6,52,5 7,925, Average 2,45,932 2,763,252 3,392, TP Throughput; Op Operations; f Frequency Unprotected Protected Average TPs are based on average latencies. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 36 / 52

58 Results Conclusions results of HS designs on Xilinx FPGAs Avg. Latency f TP TP/Area Width Slices LUTs FFs BRAMs [Clock cycles] [MHz] [ Op sec ] [ Op Slices sec ] Virtex7:xc7vx485tffg ,269 3,36 4, ,45, ,227 2,68 4, ,763, ,44 4,66 2 3,392, Virtex6:xc6vlx24tff ,36 3,4 4, ,45, , 2,886 4, ,763, ,74 2,84 4,66 2 3,392, Zynq:xc7z2clg ,35 3,72 4, ,45, ,72 2,72 4, ,763, ,224 2,765 4,66 2 3,392, TP is calculated using the average latency at maximum frequency ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 37 / 52

59 Results Conclusions results of HS designs on Altera FPGAs Avg. Latency f TP TP/Area Width ALMs ALUTs FFs MBits [Clock cycles] [MHz] [ Op sec ] [ Op ALMs sec ] Stratix V:5SGXEA7K2F4C3 64 2,998 5,66 5,383 2,48 2,45, ,766 5,76 5,3 2,48 2,763, ,629 4,84 5,84 2,48 3,392, Stratix IV:EP4SE53H35C4 64 4,32 3,936 4,687 2,48 2,45, ,73 4,76 4,582 2,48 2,763, ,68 4,3 4,67 2,48 3,392, TP is calculated using the average latency at maximum frequency ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 38 / 52

60 Results Conclusions results of HS designs on Microsemi FPGAs Avg. Latency f TP TP/Area Width LEs 4LUTs FFs RAM [Clock cycles] [MHz] [ Op sec ] [ Op ALMs sec ] IGLOO2:M2GLTS-FG ,846 6,737 5, ,45, ,7 6,7 4, ,763, ,839 5,927 4,926 3,392, TP is calculated using the average latency at maximum frequency ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 39 / 52

61 Results Conclusions results of HS designs on ASICs Area Avg. Latency f TP TP/Area Width GEs [Clock cycles] [MHz] [ Op sec ] [ Op GEs sec ] 64 57,669 2,98, ,468 2,386, ,62 2,77, TP is calculated using the average latency at maximum frequency Implemented on 9nm technology Area is counted in terms of NAND2x gates Design area does not include size of memories Results are after synthesis ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 4 / 52

62 Results Conclusions Poer Estimates for HS using Xpoer and Libero Family Avail. P static [mw] P dynamic [mw] P total [mw] LUTs W=64 W=32 W=6 W=64 W=32 W=6 W=64 W=32 W=6 VX7 33, VX6 5,72 3,424 3,423 3, ,5 3,468 3,477 ZQ 53, AX7 63, SN6 9, IG2 2, FY Family; VX Virtex; SN Spartan; AX Artix; ZQ Zynq; IG IGLOO Results are generated under the folloing conditions: Clock at MHz. randomly generated values of k for each of the five fields. Size of k is equal to curve field size. Static poer of VX6 as reported by tool does not seem correct. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 4 / 52

63 Comparison of HS results Results Conclusions Curve f TP TP/Area Work Device Slices LUTs DSPs BRAMs Size Type [MHz] [ Op sec ] [ Op LUTs sec ] 92 GF (p) GF (p) TW[W=32] VX GF (p),227 2, (Protected) 384 GF (p) GF (p) GF (p) GF (p) TW[W=64] VX GF (p),269 3, (Protected) 384 GF (p) GF (p) GF (p) 3, Amiet et. al[w=32] VX GF (p),5.22 N/A 6,86 2 N/A N/A 384 GF (p) 55.8 (Unprotected) 52 GF (p) Amiet et al.[w=64] VX GF (p) N/A 8, N/A N/A (Unprotected) 52 GF (p) p-92 3, p-224 2, Alrimeih et al. VX p-256,2 32, ,5.75 (N/A) 384 p p Roy et al. VX p Baldin et al. VX-5 92 GF (p) N/A 6, N/A N/A GF (p) N/A 7,8 N/A N/A TW This Work; VX Virtex; GF (p) any prime for a given size; p-xxx NIST prime only ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 42 / 52

64 Other results Introduction Results Conclusions Curve f TP TP/Area Work Device Slices LUTs DSPs BRAMs Size Type [MHz] [ Op sec ] [ Op Slices sec ] 92 GF (p) 4, Ghosh et al. VX GF (p) 7, GF (p) 2, p p Ananyi et al. VX p-256 2, p p Güneysu et al. VX p , , , p , ,76.84 Güneysu et al. VX p-256, , McIvor et al. VX GF (p) 5, Guillermin SX-II 256 GF (p) 9, , GF (p) 6, , Schiniakis et al. SX-II 256 GF (p) 9, , GF (p) 3, GF (p) 7, VX Virtex; SX Stratix; ALMs; 2 Op [ ALMs sec ] ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 43 / 52

65 Results Conclusions Latency & Throughput for a given field and no. of PE for LW Latency in clock cycles TP in Op/sec Field size Number of PE units (#PE) at f = MHz #PE=8 #PE=4 #PE=2 #PE=8 #PE=4 #PE=2 92-bit,477,45 2,4,45 4,265, bit 2,22, 3,684,47 6,652, bit 3,73,655 5,23,35 9,58, bit 9,453,25 6,857,85 3,75, bit 22,89,39 4,8,938 79,687, Average 7,82,35 3,993,42 26,365, bit,576,957 2,557,883 4,54, bit 2,358,523 3,922,55 7,74, bit 3,279,973 5,555,83,34, bit,57,53 7,96,72 33,676, bit 24,33,723 44,374,347 84,533, Average 8,37,338 4,865,465 27,99, TP Throughput; Op Operations; f Frequency Unprotected Protected ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 44 / 52

66 Results Conclusions results of LW designs on Xilinx FPGAs Avg. Latency f TP TP/Area # of PEs Slices LUTs FFs BRAMs [Clock cycles] [MHz] [ Op sec ] [ Op Slices sec ] Zynq:xc7z2clg ,697,69 2 8,37, ,353,5 2 4,865, , ,99, Artix:xc7atcsg ,675,69 2 8,37, ,3,5 2 4,865, , ,99, Spartan6:xc7vx485tffg ,758,78 2 8,37, ,36,24 2 4,865, , ,99, TP is calculated using the average latency at maximum frequency ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 45 / 52

67 Results Conclusions results of LW designs on Altera FPGAs Avg. Latency f TP TP/Area # of PEs ALMs ALUTs FFs MBits [Clock cycles] [MHz] [ Op sec ] [ Op ALMs sec ] Stratix V:5SGXEA7K2F4C ,5,222 2,48 8,37, , ,8 4,865, ,48 27,99, Cyclone V:5CEBA4F23C , ,786 8,37, , ,875 4,865, ,92 27,99, Stratix IV:EP4SE53H35C4 8,7, ,48 8,37, , ,48 4,865, ,48 27,99, TP is calculated using the average latency at maximum frequency ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 46 / 52

68 Results Conclusions results of LW designs on Microsemi FPGAs Avg. Latency f TP TP/Area # of PEs LEs 4LUTs FFs RAM [Clock cycles] [MHz] [ Op sec ] [ Op ALMs sec ] IGLOO2-M2GL5S:FG ,25 2,835,49 2,45, ,753 2,385,34 2,763, ,522 2,79,269 3,392, TP is calculated using the average latency at maximum frequency ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 47 / 52

69 Results Conclusions results of LW designs on ASICs Area Avg. Latency f TP TP/Area Width GEs [Clock cycles] [MHz] [ Op sec ] [ Op GEs sec ] 8 2,696 3,273, ,42 23,832, ,42 44,998, TP is calculated using the average latency at maximum frequency Implemented on 9nm technology Area is counted in terms of NAND2x gates Design area does not include size of memories Results are after synthesis ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 48 / 52

70 Results Conclusions Poer Estimates for LW using Xpoer and Libero Family Avail. P static [mw] P dynamic [mw] P total [mw] LUTs PE=8 PE=4 PE=2 PE=8 PE=4 PE=2 PE=8 PE=4 PE=2 ZQ 53, AX7 63, SN6 9, IG2 6, FY Family; SN Spartan; AX Artix; ZQ Zynq; IG IGLOO Results are generated under the folloing conditions: Clock at MHz. randomly generated values of k for each of the five fields. Size of k is equal to curve field size. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 49 / 52

71 Comparison of LW results Results Conclusions Curve f TP TP/Area Work Device Slices LUTs DSPs BRAMs Size Type [MHz] [ Op sec ] [ Op Slices sec ] 92 GF (p) GF (p).27 TW[#PEs=8] SN GF (p) 48, GF (p) GF (p) 7.5 Driessen et al. SN p N/A N/A N/A Royet al. SN p VX p Varchola et al. VX-2 Pro 224 p N/A VX-2 Pro 256 p N/A Vliegen et al. VX-2 Pro 256 GF (p) 694 N/A TW This Work; VX Virtex; SN Spartan; Multipliers; GF (p) any prime for a given size; p-xxx NIST prime only ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 5 / 52

72 Conclusions Introduction Results Conclusions We designed to implementations of a scalable ECC processor, one for high-speed and one lighteight. Unlike many published results, our processor is not limited to NIST primes. Our TP/Area results are slightly loer than the high-speed design by Amiet, hoever, e use only one MMM, a fraction of the BRAMs, no DSP units neither contribute to TP/Area and our design is protected. In case of ligheight, our design is about 6 larger as compared to Roy s design. But e do not use any DSP units and only to BRAMs. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 5 / 52

73 Results Conclusions Thanks for your attention. ReConFig-27 A. Salman, A. Ferozpuri, P Yalla et al. Scalable ECC Processor, High-Speed Lighteight 52 / 52

Elliptic Curve Cryptography and Security of Embedded Devices

Elliptic Curve Cryptography and Security of Embedded Devices Elliptic Curve Cryptography and Security of Embedded Devices Ph.D. Defense Vincent Verneuil Institut de Mathématiques de Bordeaux Inside Secure June 13th, 2012 V. Verneuil - Elliptic Curve Cryptography

More information

Tripartite Modular Multiplication

Tripartite Modular Multiplication Tripartite Modular Multiplication Kazuo Sakiyama 1,2, Miroslav Knežević 1, Junfeng Fan 1, Bart Preneel 1, and Ingrid Verbauhede 1 1 Katholieke Universiteit Leuven Department of Electrical Engineering ESAT/SCD-COSIC

More information

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm An Optimized Hardware Architecture of Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, and Tarek El-Ghazawi 1 1 The George Washington University, Washington, DC 20052,

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m )

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) 1 / 19 Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) Métairie Jérémy, Tisserand Arnaud and Casseau Emmanuel CAIRN - IRISA July 9 th, 2015 ISVLSI 2015 PAVOIS ANR 12 BS02

More information

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) * Institute for Applied Information Processing and Communications Graz University of Technology Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) * CHES 2002 Workshop on Cryptographic Hardware and Embedded

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY On Repeated Squarings in Binary Fields Kimmo Järvinen Helsinki University of Technology August 14, 2009 K. Järvinen On Repeated Squarings in Binary Fields 1/1 Introduction Repeated squaring Repeated squaring:

More information

Hardware implementations of ECC

Hardware implementations of ECC Hardware implementations of ECC The University of Electro- Communications Introduction Public- key Cryptography (PKC) The most famous PKC is RSA and ECC Used for key agreement (Diffie- Hellman), digital

More information

Co-Z Addition Formulæ and Binary Ladders on Elliptic Curves. Raveen Goundar Marc Joye Atsuko Miyaji

Co-Z Addition Formulæ and Binary Ladders on Elliptic Curves. Raveen Goundar Marc Joye Atsuko Miyaji Co-Z Addition Formulæ and Binary Ladders on Elliptic Curves Raveen Goundar Marc Joye Atsuko Miyaji Co-Z Addition Formulæ and Binary Ladders on Elliptic Curves Raveen Goundar Marc Joye Atsuko Miyaji Elliptic

More information

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field Khalid Javeed BEng, MEng A Disertation submitted in fulfilment of the requirements for the award of Doctor

More information

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware Kris Gaj Soonhak Kwon Patrick Baier Paul Kohlbrenner Hoang Le Khaleeluddin Mohammed Ramakrishna Bachimanchi George Mason University

More information

Hardware Implementation of an Elliptic Curve Processor over GF(p)

Hardware Implementation of an Elliptic Curve Processor over GF(p) Hardware Implementation of an Elliptic Curve Processor over GF(p) Sıddıka Berna Örs, Lejla Batina,, Bart Preneel, Joos Vandewalle Katholieke Universiteit Leuven, ESAT/SCD-COSIC Kasteelpark Arenberg, B-3

More information

Square Always Exponentiation

Square Always Exponentiation Square Always Exponentiation Christophe Clavier 1 Benoit Feix 1,2 Georges Gagnerot 1,2 Mylène Roussellet 2 Vincent Verneuil 2,3 1 XLIM-Université de Limoges, France 2 INSIDE Secure, Aix-en-Provence, France

More information

Side-channel attacks on PKC and countermeasures with contributions from PhD students

Side-channel attacks on PKC and countermeasures with contributions from PhD students basics Online Side-channel attacks on PKC and countermeasures (Tutorial @SPACE2016) with contributions from PhD students Lejla Batina Institute for Computing and Information Sciences Digital Security Radboud

More information

Implementation of ECM Using FPGA devices. ECE646 Dr. Kris Gaj Mohammed Khaleeluddin Hoang Le Ramakrishna Bachimanchi

Implementation of ECM Using FPGA devices. ECE646 Dr. Kris Gaj Mohammed Khaleeluddin Hoang Le Ramakrishna Bachimanchi Implementation of ECM Using FPGA devices ECE646 Dr. Kris Gaj Mohammed Khaleeluddin Hoang Le Ramakrishna Bachimanchi Introduction Why factor numbers? Security of RSA relies on difficulty to factor large

More information

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,

More information

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields . Motivation and introduction An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields Marcin Rogawski Ekawat Homsirikamol Kris Gaj Cryptographic Engineering Research Group (CERG)

More information

Power Analysis to ECC Using Differential Power between Multiplication and Squaring

Power Analysis to ECC Using Differential Power between Multiplication and Squaring Power Analysis to ECC Using Differential Power between Multiplication and Squaring Toru Akishita 1 and Tsuyoshi Takagi 2 1 Sony Corporation, Information Technologies Laboratories, Tokyo, Japan akishita@pal.arch.sony.co.jp

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,

More information

Side Channel Analysis and Protection for McEliece Implementations

Side Channel Analysis and Protection for McEliece Implementations Side Channel Analysis and Protection for McEliece Implementations Thomas Eisenbarth Joint work with Cong Chen, Ingo von Maurich and Rainer Steinwandt 9/27/2016 NATO Workshop- Tel Aviv University Overview

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

Hardware Implementation of Elliptic Curve Processor over GF (p)

Hardware Implementation of Elliptic Curve Processor over GF (p) Hardware Implementation of Elliptic Curve Processor over GF (p) Sıddıka Berna Örs, Lejla Batina, Bart Preneel K.U. Leuven ESAT/COSIC Kasteelpark Arenberg 10 B-3001 Leuven-Heverlee, Belgium {Siddika.BernaOrs,

More information

Error-free protection of EC point multiplication by modular extension

Error-free protection of EC point multiplication by modular extension Error-free protection of EC point multiplication by modular extension Martin Seysen February 21, 2017 Giesecke & Devrient GmbH, Prinzregentenstraße 159, D-81677 München, e-mail: m.seysen@gmx.de Abstract

More information

A DPA attack on RSA in CRT mode

A DPA attack on RSA in CRT mode A DPA attack on RSA in CRT mode Marc Witteman Riscure, The Netherlands 1 Introduction RSA is the dominant public key cryptographic algorithm, and used in an increasing number of smart card applications.

More information

Hardware Implementation of Elliptic Curve Cryptography over Binary Field

Hardware Implementation of Elliptic Curve Cryptography over Binary Field I. J. Computer Network and Information Security, 2012, 2, 1-7 Published Online March 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcnis.2012.02.01 Hardware Implementation of Elliptic Curve Cryptography

More information

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol Introduction to Side Channel Analysis Elisabeth Oswald University of Bristol Outline Part 1: SCA overview & leakage Part 2: SCA attacks & exploiting leakage and very briefly Part 3: Countermeasures Part

More information

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and

More information

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes Wen Wang 1, Jakub Szefer 1, and Ruben Niederhagen 2 1. Yale University, USA 2. Fraunhofer Institute SIT, Germany April 9, 2018 PQCrypto 2018

More information

Hardware Architectures for Public Key Algorithms Requirements and Solutions for Today and Tomorrow

Hardware Architectures for Public Key Algorithms Requirements and Solutions for Today and Tomorrow Hardware Architectures for Public Key Algorithms Requirements and Solutions for Today and Tomorrow Cees J.A. Jansen Pijnenburg Securealink B.V. Vught, The Netherlands ISSE Conference, London 27 September,

More information

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID? Are standards compliant Elliptic Curve Cryptosystems feasible on RFID? Sandeep S. Kumar and Christof Paar Horst Görtz Institute for IT Security, Ruhr-Universität Bochum, Germany Abstract. With elliptic

More information

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography Jean-Luc Beuchat Laboratory of Cryptography and Information Security Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1

More information

Efficient randomized regular modular exponentiation using combined Montgomery and Barrett multiplications

Efficient randomized regular modular exponentiation using combined Montgomery and Barrett multiplications University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2016 Efficient randomized regular modular exponentiation

More information

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7, HIGH PERFORMANCE MONTGOMERY MULTIPLICATION USING DADDA TREE ADDITION Thandri Adi Varalakshmi Devi 1, P Subhashini 2 1 PG Scholar, Dept of ECE, Kakinada Institute of Technology, Korangi, AP, India. 2 Assistant

More information

Speeding Up Bipartite Modular Multiplication

Speeding Up Bipartite Modular Multiplication Speeding Up Bipartite Modular Multiplication Miroslav Knežević, Frederik Vercauteren, and Ingrid Verbauhede Katholieke Universiteit Leuven Department of Electrical Engineering - ESAT/SCD-COSIC and IBBT

More information

A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p

A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over F p Nicolas Guillermin 1,2 1 DGA Information Superiority, Bruz, France 2 IRMAR, Université Rennes 1, France Abstract. We present

More information

AES [and other Block Ciphers] Implementation Tricks

AES [and other Block Ciphers] Implementation Tricks AES [and other Bloc Ciphers] Implementation Trics Cryptographic algorithms Basic primitives Survey by Stephen et al, LNCS 1482, Sep. 98 General Structure of a Bloc Cipher Useful Properties for Implementing

More information

2. Accelerated Computations

2. Accelerated Computations 2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message

More information

Lessons Learned from High-Speed Implementa6on and Benchmarking of Two Post-Quantum Public-Key Cryptosystems

Lessons Learned from High-Speed Implementa6on and Benchmarking of Two Post-Quantum Public-Key Cryptosystems Lessons Learned from High-Speed Implementa6on and Benchmarking of Two Post-Quantum Public-Key Cryptosystems Malik Umar Sharif, Ahmed Ferozpuri, and Kris Gaj George Mason University USA Partially supported

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

Side-channel attacks and countermeasures for curve based cryptography

Side-channel attacks and countermeasures for curve based cryptography Side-channel attacks and countermeasures for curve based cryptography Tanja Lange Technische Universiteit Eindhoven tanja@hyperelliptic.org 28.05.2007 Tanja Lange SCA on curves p. 1 Overview Elliptic curves

More information

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018 Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54 Modular Exponentiation

More information

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India We solved the discrete logarithm of

More information

Side-Channel Attacks on Quantum-Resistant Supersingular Isogeny Diffie-Hellman

Side-Channel Attacks on Quantum-Resistant Supersingular Isogeny Diffie-Hellman Side-Channel Attacks on Quantum-Resistant Supersingular Isogeny Diffie-Hellman Presenter: Reza Azarderakhsh CEECS Department and I-Sense, Florida Atlantic University razarderakhsh@fau.edu Paper by: Brian

More information

Représentation RNS des nombres et calcul de couplages

Représentation RNS des nombres et calcul de couplages Représentation RNS des nombres et calcul de couplages Sylvain Duquesne Université Rennes 1 Séminaire CCIS Grenoble, 7 Février 2013 Sylvain Duquesne (Rennes 1) RNS et couplages Grenoble, 07/02/13 1 / 29

More information

Arithmetic in Integer Rings and Prime Fields

Arithmetic in Integer Rings and Prime Fields Arithmetic in Integer Rings and Prime Fields A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA C 3 FA C 2 FA C 1 FA C 0 C 4 S 3 S 2 S 1 S 0 http://koclab.org Çetin Kaya Koç Spring 2018 1 / 71 Contents Arithmetic in Integer

More information

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2) An Algorithm for Inversion in GF2 m Suitable for Implementation Using a Polynomial Multiply Instruction on GF2 Katsuki Kobayashi, Naofumi Takagi, and Kazuyoshi Takagi Department of Information Engineering,

More information

Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl

Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography International Workshop on Information Security & Hiding (ISH '05) Institute for Applied Information Processing and Communications

More information

A low-time-complexity and secure dual-field scalar multiplication based on co-z protected NAF

A low-time-complexity and secure dual-field scalar multiplication based on co-z protected NAF LETTER IEICE Electronics Express, Vol.11, No.11, 1 12 A low-time-complexity and secure dual-field scalar multiplication based on co-z protected NAF Jizeng Wei a), Xulong Liu, Hao Liu, and Wei Guo b) School

More information

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC

Single Base Modular Multiplication for Efficient Hardware RNS Implementations of ECC Single Base Modular Multiplication for Efficient Hardare RNS Implementations of ECC Karim Bigou and Arnaud Tisserand CNRS, IRISA, INRIA Centre Rennes - Bretagne Atlantique and University Rennes 1, 6 rue

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

Design of Low Power Optimized MixColumn/Inverse MixColumn Architecture for AES

Design of Low Power Optimized MixColumn/Inverse MixColumn Architecture for AES Design of Low Power Optimized MixColumn/Inverse MixColumn Architecture for AES Rajasekar P Assistant Professor, Department of Electronics and Communication Engineering, Kathir College of Engineering, Neelambur,

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

Fast and Regular Algorithms for Scalar Multiplication over Elliptic Curves

Fast and Regular Algorithms for Scalar Multiplication over Elliptic Curves Fast and Regular Algorithms for Scalar Multiplication over Elliptic Curves Matthieu Rivain CryptoExperts matthieu.rivain@cryptoexperts.com Abstract. Elliptic curve cryptosystems are more and more widespread

More information

RSA Key Extraction via Low- Bandwidth Acoustic Cryptanalysis. Daniel Genkin, Adi Shamir, Eran Tromer

RSA Key Extraction via Low- Bandwidth Acoustic Cryptanalysis. Daniel Genkin, Adi Shamir, Eran Tromer RSA Key Extraction via Low- Bandwidth Acoustic Cryptanalysis Daniel Genkin, Adi Shamir, Eran Tromer Mathematical Attacks Input Crypto Algorithm Key Output Goal: recover the key given access to the inputs

More information

Hardware Acceleration of the Tate Pairing in Characteristic Three

Hardware Acceleration of the Tate Pairing in Characteristic Three Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)

More information

Arithmetic operators for pairing-based cryptography

Arithmetic operators for pairing-based cryptography 7. Kryptotag November 9 th, 2007 Arithmetic operators for pairing-based cryptography Jérémie Detrey Cosec, B-IT, Bonn, Germany jdetrey@bit.uni-bonn.de Joint work with: Jean-Luc Beuchat Nicolas Brisebarre

More information

Asymmetric Encryption

Asymmetric Encryption -3 s s Encryption Comp Sci 3600 Outline -3 s s 1-3 2 3 4 5 s s Outline -3 s s 1-3 2 3 4 5 s s Function Using Bitwise XOR -3 s s Key Properties for -3 s s The most important property of a hash function

More information

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware Kris Gaj 1, Soonhak Kwon 2, Patrick Baier 1, Paul Kohlbrenner 1, Hoang Le 1, Mohammed Khaleeluddin 1, Ramakrishna Bachimanchi

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com of SubBytes and InvSubBytes s of AES Algorithm Using Power Analysis Attack Resistant Reversible

More information

Blinded Fault Resistant Exponentiation FDTC 06

Blinded Fault Resistant Exponentiation FDTC 06 Previous Work Our Algorithm Guillaume Fumaroli 1 David Vigilant 2 1 Thales Communications guillaume.fumaroli@fr.thalesgroup.com 2 Gemalto david.vigilant@gemalto.com FDTC 06 Outline Previous Work Our Algorithm

More information

New Implementations of the WG Stream Cipher

New Implementations of the WG Stream Cipher New Implementations of the WG Stream Cipher Hayssam El-Razouk, Arash Reyhani-Masoleh, and Guang Gong Abstract This paper presents two new hardware designs of the WG-28 cipher, one for the multiple output

More information

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols Moncef Amara University of Paris 8 LAGA laboratory Saint-Denis / France Amar Siad University of Paris 8 LAGA

More information

On the Masking Countermeasure and Higher-Order Power Analysis Attacks

On the Masking Countermeasure and Higher-Order Power Analysis Attacks 1 On the Masking Countermeasure and Higher-Order Power Analysis Attacks François-Xavier Standaert, Eric Peeters, Jean-Jacques Quisquater UCL Crypto Group, Place du Levant, 3, B-1348 Louvain-La-Neuve, Belgium.

More information

FPGA-BASED ACCELERATOR FOR POST-QUANTUM SIGNATURE SCHEME SPHINCS-256

FPGA-BASED ACCELERATOR FOR POST-QUANTUM SIGNATURE SCHEME SPHINCS-256 IMES FPGA-BASED ACCELERATOR FOR POST-QUANTUM SIGNATURE SCHEME SPHINCS-256 Dorian Amiet 1, Andreas Curiger 2 and Paul Zbinden 1 1 HSR Hochschule für Technik, Rapperswil, Switzerland 2 Securosys SA, Zürich,

More information

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers Sarani Bhattacharya and Debdeep Mukhopadhyay Indian Institute of Technology Kharagpur PROOFS 2016 August

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER Indian Journal of Electronics and Electrical Engineering (IJEEE) Vol.2.No.1 2014pp1-6 available at: www.goniv.com Paper Received :05-03-2014 Paper Published:28-03-2014 Paper Reviewed by: 1. John Arhter

More information

Implementation Options for Finite Field Arithmetic for Elliptic Curve Cryptosystems Christof Paar Electrical & Computer Engineering Dept. and Computer Science Dept. Worcester Polytechnic Institute Worcester,

More information

Horizontal and Vertical Side-Channel Attacks against Secure RSA Implementations

Horizontal and Vertical Side-Channel Attacks against Secure RSA Implementations Introduction Clavier et al s Paper This Paper Horizontal and Vertical Side-Channel Attacks against Secure RSA Implementations Aurélie Bauer Éliane Jaulmes Emmanuel Prouff Justine Wild ANSSI Session ID:

More information

Random Delay Insertion: Effective Countermeasure against DPA on FPGAs

Random Delay Insertion: Effective Countermeasure against DPA on FPGAs Random Delay Insertion: Effective Countermeasure against DPA on FPGAs Lu, Yingxi Dr. Máire O Neill Prof. John McCanny Overview September 2004 PRESENTATION OUTLINE DPA and countermeasures Random Delay Insertion

More information

Resistance of Randomized Projective Coordinates Against Power Analysis

Resistance of Randomized Projective Coordinates Against Power Analysis Resistance of Randomized Projective Coordinates Against Power Analysis William Dupuy and Sébastien Kunz-Jacques DCSSI Crypto Lab 51, bd de Latour-Maubourg, 75700 PARIS 07 SP william.dupuy@laposte.net,

More information

Faster ECC over F 2. (feat. PMULL)

Faster ECC over F 2. (feat. PMULL) Faster ECC over F 2 571 (feat. PMULL) Hwajeong Seo 1 Institute for Infocomm Research (I2R), Singapore hwajeong84@gmail.com Abstract. In this paper, we show efficient elliptic curve cryptography implementations

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2016 FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their

More information

FPGA IMPLEMENTATION FOR ELLIPTIC CURVE CRYPTOGRAPHY OVER BINARY EXTENSION FIELD

FPGA IMPLEMENTATION FOR ELLIPTIC CURVE CRYPTOGRAPHY OVER BINARY EXTENSION FIELD University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 10-5-2017 FPGA IMPLEMENTATION FOR ELLIPTIC CURVE CRYPTOGRAPHY OVER BINARY EXTENSION FIELD Che Chen University of Windsor

More information

Co-Z Addition Formulæ and Binary Lad Elliptic Curves. Goundar, Raveen Ravinesh; Joye, Marc Author(s) Atsuko

Co-Z Addition Formulæ and Binary Lad Elliptic Curves. Goundar, Raveen Ravinesh; Joye, Marc Author(s) Atsuko JAIST Reposi https://dspace.j Title Co-Z Addition Formulæ and Binary Lad Elliptic Curves Goundar, Raveen Ravinesh; Joye, Marc Author(s) Atsuko Citation Lecture Notes in Computer Science, 6 79 Issue Date

More information

Complete addition formulas for prime order elliptic curves

Complete addition formulas for prime order elliptic curves Complete addition formulas for prime order elliptic curves Joost Renes 1 Craig Costello 2 Lejla Batina 1 j.renes@cs.ru.nl 1 Radboud University, Nijmegen, The Netherlands 2 Microsoft Research, Redmond,

More information

6. ELLIPTIC CURVE CRYPTOGRAPHY (ECC)

6. ELLIPTIC CURVE CRYPTOGRAPHY (ECC) 6. ELLIPTIC CURVE CRYPTOGRAPHY (ECC) 6.0 Introduction Elliptic curve cryptography (ECC) is the application of elliptic curve in the field of cryptography.basically a form of PKC which applies over the

More information

Compact Ring LWE Cryptoprocessor

Compact Ring LWE Cryptoprocessor 1 Compact Ring LWE Cryptoprocessor CHES 2014 Sujoy Sinha Roy 1, Frederik Vercauteren 1, Nele Mentens 1, Donald Donglong Chen 2 and Ingrid Verbauwhede 1 1 ESAT/COSIC and iminds, KU Leuven 2 Electronic Engineering,

More information

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world. Outline EECS 150 - Components and esign Techniques for igital Systems Lec 18 Error Coding Errors and error models Parity and Hamming Codes (SECE) Errors in Communications LFSRs Cyclic Redundancy Check

More information

Efficient Finite Field Multiplication for Isogeny Based Post Quantum Cryptography

Efficient Finite Field Multiplication for Isogeny Based Post Quantum Cryptography Efficient Finite Field Multiplication for Isogeny Based Post Quantum Cryptography Angshuman Karmakar 1 Sujoy Sinha Roy 1 Frederik Vercauteren 1,2 Ingrid Verbauwhede 1 1 COSIC, ESAT KU Leuven and iminds

More information

MODULAR multiplication with large integers is the main

MODULAR multiplication with large integers is the main 1658 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 5, MAY 2017 A General Digit-Serial Architecture for Montgomery Modular Multiplication Serdar Süer Erdem, Tuğrul Yanık,

More information

Randomized Signed-Scalar Multiplication of ECC to Resist Power Attacks

Randomized Signed-Scalar Multiplication of ECC to Resist Power Attacks Randomized Signed-Scalar Multiplication of ECC to Resist Power Attacks Jae Cheol Ha 1 and Sang Jae Moon 2 1 Division of Information Science, Korea Nazarene Univ., Cheonan, Choongnam, 330-718, Korea jcha@kornu.ac.kr

More information

Lecture 8: Sequential Multipliers

Lecture 8: Sequential Multipliers Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication

More information

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic

More information

On A Large-scale Multiplier for Public Key Cryptographic Hardware

On A Large-scale Multiplier for Public Key Cryptographic Hardware 1,a) 1 1 1 1 1 Wallace tree n log n 64 128 Wallace tree,, Wallace tree,, VHDL On A Large-scale Multiplier for Public Key Cryptographic Hardware Masaaki Shirase 1,a) Kimura Keigo 1 Murayama Hiroyuki 1 Kato

More information

Montgomery Algorithm for Modular Multiplication with Systolic Architecture

Montgomery Algorithm for Modular Multiplication with Systolic Architecture Montgomery Algorithm for Modular Multiplication with ystolic Architecture MRABET Amine LIAD Paris 8 ENIT-TUNI EL MANAR University A - MP - Gardanne PAE 016 1 Plan 1 Introduction for pairing Montgomery

More information

MoTE-ECC: Energy-Scalable Elliptic Curve Cryptography for Wireless Sensor Networks

MoTE-ECC: Energy-Scalable Elliptic Curve Cryptography for Wireless Sensor Networks MoTE-ECC: Energy-Scalable Elliptic Curve Cryptography for Wireless Sensor Networks Zhe Liu 1, Erich Wenger 2, and Johann Großschädl 1 1 University of Luxembourg, Laboratory of Algorithmics, Cryptology

More information

Efficient Application of Countermeasures for Elliptic Curve Cryptography

Efficient Application of Countermeasures for Elliptic Curve Cryptography Efficient Application of Countermeasures for Elliptic Curve Cryptography Vladimir Soukharev, Ph.D. Basil Hess, Ph.D. InfoSec Global Inc. May 19, 2017 Outline Introduction Brief Summary of ECC Arithmetic

More information

Side-Channel Analysis on Blinded Regular Scalar Multiplications

Side-Channel Analysis on Blinded Regular Scalar Multiplications Side-Channel Analysis on Blinded Regular Scalar Multiplications Extended Version Benoit Feix 1 and Mylène Roussellet 2 and Alexandre Venelli 3 1 UL Security Transactions, UK Security Lab benoit.feix@ul.com

More information

International Journal of Advanced Computer Technology (IJACT)

International Journal of Advanced Computer Technology (IJACT) AN EFFICIENT DESIGN OF LOW POWER,FAST EL- LIPTIC CURVE SCALAR MULTIPLIER IN ECC USING S Jayalakshmi K R, M.Tech student, Mangalam college of engineering,kottayam,india; Ms.Hima Sara Jacob, Assistant professor,

More information

Modular Multiplication in GF (p k ) using Lagrange Representation

Modular Multiplication in GF (p k ) using Lagrange Representation Modular Multiplication in GF (p k ) using Lagrange Representation Jean-Claude Bajard, Laurent Imbert, and Christophe Nègre Laboratoire d Informatique, de Robotique et de Microélectronique de Montpellier

More information

A Refined Power-Analysis Attack on Elliptic Curve Cryptosystems

A Refined Power-Analysis Attack on Elliptic Curve Cryptosystems A Refined Power-Analysis Attack on Elliptic Curve Cryptosystems Louis Goubin CP8 Crypto Lab, SchlumbergerSema 36-38 rue de la Princesse, BP45, 78430Louveciennes Cedex, France lgoubin@slb.com Abstract.

More information

Hardware Implementation of the Code-based Key Encapsulation Mechanism using Dyadic GS Codes (DAGS)

Hardware Implementation of the Code-based Key Encapsulation Mechanism using Dyadic GS Codes (DAGS) Hardware Implementation of the Code-based Key Encapsulation Mechanism using Dyadic GS Codes (DAGS) Viet Dang and Kris Gaj ECE Department George Mason University Fairfax, VA, USA Introduction to DAGS The

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Public Key Algorithms

Public Key Algorithms Public Key Algorithms Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse571-09/

More information

New Composite Operations and Precomputation Scheme for Elliptic Curve Cryptosystems over Prime Fields

New Composite Operations and Precomputation Scheme for Elliptic Curve Cryptosystems over Prime Fields New Composite Operations and Precomputation Scheme for Elliptic Curve Cryptosystems over Prime Fields Patrick Longa 1 and Ali Miri 2 1 Department of Electrical and Computer Engineering University of Waterloo,

More information

Efficient Hardware Calculation of Inverses in GF (2 8 )

Efficient Hardware Calculation of Inverses in GF (2 8 ) Efficient Hardware Calculation of Inverses in GF (2 8 ) R. W. Ward, Dr. T. C. A. Molteno 1 Physics Department University of Otago Box 56, Dunedin, New Zealand 1 Email: tim@physics.otago.ac.nz Abstract:

More information