Similar documents
A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

Modular Multiplication in GF (p k ) using Lagrange Representation

Modular Reduction without Pre-Computation for Special Moduli

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Optimal Use of Montgomery Multiplication on Smart Cards

Optimal Extension Field Inversion in the Frequency Domain

Hardware implementations of ECC

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

Reducing the Complexity of Normal Basis Multiplication

Proceedings, 13th Symposium on Computer Arithmetic, T. Lang, J.-M. Muller, and N. Takagi, editors, pages , Asilomar, California, July 6-9,

High Performance GHASH Function for Long Messages

Efficient Software-Implementation of Finite Fields with Applications to Cryptography

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

A Bit-Serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2 m )

Montgomery Multiplication in GF(2 k ) * **

GF(2 m ) arithmetic: summary

Performance of Finite Field Arithmetic in an Elliptic Curve Cryptosystem

A Knapsack Cryptosystem Based on The Discrete Logarithm Problem

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials

Cryptography IV: Asymmetric Ciphers

PUBLIC-KEY cryptography (PKC), a concept introduced

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves

Hardware Implementation of Elliptic Curve Processor over GF (p)

Public Key 9/17/2018. Symmetric Cryptography Review. Symmetric Cryptography: Shortcomings (1) Symmetric Cryptography: Analogy

recover the secret key [14]. More recently, the resistance of smart-card implementations of the AES candidates against monitoring power consumption wa

PARALLEL MULTIPLICATION IN F 2

Other Public-Key Cryptosystems

ii Abstract Elliptic curves are the basis for a relative new class of public-key schemes. It is predicted that elliptic curves will replace many exist

On the complexity of computing discrete logarithms in the field F

Public Key Cryptography

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers

Arithmétique et Cryptographie Asymétrique

Fast Arithmetic for Public-Key Algorithms in Galois Fields with Composite Exponents

Elliptic Curves and Cryptography

Elliptic Curve Cryptography and Security of Embedded Devices

Cryptanalysis on An ElGamal-Like Cryptosystem for Encrypting Large Messages

Other Public-Key Cryptosystems

Asymmetric Encryption

Représentation RNS des nombres et calcul de couplages

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

International Journal of Advanced Computer Technology (IJACT)

Low-Weight Polynomial Form Integers for Efficient Modular Multiplication

Elliptic Curve Public-Key Cryptosystems An Introduction

Batch Verification of ECDSA Signatures AfricaCrypt 2012 Ifrane, Morocco

Pseudo-random Number Generation. Qiuliang Tang

Subquadratic Space Complexity Multiplication over Binary Fields with Dickson Polynomial Representation

Discrete Logarithm Problem

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

CPE 776:DATA SECURITY & CRYPTOGRAPHY. Some Number Theory and Classical Crypto Systems

Blind Signature Protocol Based on Difficulty of. Simultaneous Solving Two Difficult Problems

High Performance GHASH Function for Long Messages

Elliptic Curve Cryptography

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

Cryptanalysis of a Public Key Cryptosystem Proposed at ACISP 2000

New Variant of ElGamal Signature Scheme

Ti Secured communications

Finite Field Multiplier Architectures for

Arithmetic in Integer Rings and Prime Fields

17 Galois Fields Introduction Primitive Elements Roots of Polynomials... 8

CPSC 467b: Cryptography and Computer Security

Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek

Montgomery-Suitable Cryptosystems

REDUNDANT TRINOMIALS FOR FINITE FIELDS OF CHARACTERISTIC 2

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

The Elliptic Curve Digital Signature Algorithm (ECDSA) 1 2. Alfred Menezes. August 23, Updated: February 24, 2000

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

BSIDES multiplication, squaring is also an important

Elliptic Curves I. The first three sections introduce and explain the properties of elliptic curves.

Digital Signature Scheme Based on a New Hard Problem

Low Energy Digit-serial Architectures for large GF(2 m ) multiplication

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

Lecture 1: Introduction to Public key cryptography

Montgomery Multiplier and Squarer in GF(2 m )

Definition of a finite group

8 Elliptic Curve Cryptography

CPSC 467: Cryptography and Computer Security

Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications

Tripartite Modular Multiplication

Security Level of Cryptography Integer Factoring Problem (Factoring N = p 2 q) December Summary 2

Scalar Multiplication on Koblitz Curves using

On the Key-collisions in the Signature Schemes

Mathematics of Public Key Cryptography

Fast Multiple Point Multiplication on Elliptic Curves over Prime and Binary Fields using the Double-Base Number System

A new algorithm for residue multiplication modulo

Gurgen Khachatrian Martun Karapetyan

assume that the message itself is considered the RNS representation of a number, thus mapping in and out of the RNS system is not necessary. This is p

Trading Inversions for Multiplications in Elliptic Curve Cryptography

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field

Lecture 2: Number Representations (2)

Issues in Implementation of Public Key Cryptosystems

Hardware Implementation of an Elliptic Curve Processor over GF(p)

Efficient Hardware Implementation of Finite Fields with Applications to Cryptography

Transcription:

Implementation Options for Finite Field Arithmetic for Elliptic Curve Cryptosystems Christof Paar Electrical & Computer Engineering Dept. and Computer Science Dept. Worcester Polytechnic Institute Worcester, MA, USA http://www.ece.wpi.edu/research/crypt

Contents 1. Motivation 2. Overview on Finite Field Arithmetic 3. Arithmetic in GF(p) 4. Arithmetic in GF(2 m ) 5. Arithmetic in GF(p m ) 6. Open Problems

Why Public-Key Algorithms? Traditional tool for data security: symmetric) cryptography Private-key (or Main applications: Encryption Message Authentication Traditional shortcomings: 1. Key distribution, especially with large, dynamic user population (Internet) 2. How to assure sender authenticity and non-repudiation? Solution: Public-key schemes, e.g., Die-Hellman key exchange or digital signatures.

Practical Public-Key Algorithms There are three families of PK algorithms of practical relevance: Integer Factorization Schemes Exp: RSA, Rabin, etc. required operand length: 1024{2048 bits arithmetic type: Integer ring Z m Discrete Logarithm Schemes Exp: Die-Hellman, DSA, ElGamal, etc. required operand length: 1024{2048 bits arithmetic type: Finite eld Elliptic Curve Schemes Exp: EC Die-Hellman, ECDSA, etc. required operand length: 160{256 bits arithmetic type: Finite eld

Practical Aspects of PK Algorithms Major problem in practice: All PK algorithms are relatively slow. Observation: Algorithm speed is heavily dependent on arithmetic performance in HW and SW: fast arithmetic fast PK algorithm Interdisciplinary Research area (Computer Science, Electrical Engineering, Mathematics): Ecient nite eld arithmetic for discrete logarithm (DL) and elliptic curve cryptosystems (ECC)

Finite Fields Proposed for Use in PK Schemes finite fields prime fields extension fields general primes GF(p) special form primes m GF(p ) char = 2 char > 2 GF(p) pseudo generalized Mersenne Mersenne binary composite OEF n GF(2 -c) GF(2 n - 2 s... -1) n GF(2 ) GF((2 n ) m ) n GF((2 -c) m )

Platform Options finite field arithmetic hardware software classical reconfig. general proc. constrained environm. ASIC FPGA Intel, RISC embedded up (DSP, smart card,...) Arithmetic performance and area/cost greatly depends on: 1. Platform 2. Finite eld type with strong interaction: platform choice nite eld type

Prime Fields GF (p) General remarks: preferred for DL systems also popular for ECC addition is cheap inversion is much slower than multiplication use of projective coord. for ECC \Remaining" problem: Ecient multiplication algorithms Problem denition: Multiplication with long numbers (160{2048 bits) on processors with short word length (8{64 bits).

General Prime Fields GF (p): Software Exp: A, B GF(p), p<2 1024,word size w = 16 bit element representation: A = a 63 2 63 16 + + a 1 2 16 + a 0, a i {0, 1,...,2 16 1} B = b 63 2 63 16 + + b 1 2 16 + b 0, b i {0, 1,...,2 16 1} 1. Step: Multi-precision Multiplication where C = A B = c 1262 126 16 + + c 12 16 + c 0 c 0 = a 0 b 0 c 1 =. a 0 b 1 + a 1 b 0 +carry Complexity: (n/w) 2 inner products (integer mult), where n = log 2 p. Rem: Quadratic complexity can be reduced to (n/w) 1.58 using Karatsuba algorithm. Further reading: [Menezes/van Oorschot/Vanstone 97]

General Prime Fields GF (p): Software 2. Step: Modular reduction C A B mod p C mod p 1. (nave) approach: long division of C by p 2. (better) approach: fast modulo reduction techniques which avoid division: 2.1. Montgomery 2.2. Barrett 2.3. Sedlack 2.4.... (see, e.g., [Naccache/M'Rahi 96]) inner products + precomputa- Complexity: (n/w) 2 tions Rem: Multi-precision mult (Step 1) and modular reduction (Step 2) can be interleaved. further reading for Montgomery in SW: [Koc et al. 96]

General Prime Fields GF (p): Hardware recall: n = log 2 p Idea: Compute n inner products in parallel Best studied architecture: Montgomery multiplication Input: A, B, where A = n+2 i=0 a i2 i, B = n+1 i=0 b i2 i Output: A B mod N 1. R 0 =0 2. for i =0to n +2 do 3. q i = R i (0) 4. R i+1 =(R i + a i B + q i N)/2 ( ) time complexity (radix 2): n clock cycles time complexity (radix r): n/r clock cycles area complexity: k n gates, k constant Rem: ( ) is performance critical operation

General Prime Fields GF (p): Hardware Remarks 1. is O(n) times faster than software 2. modular reduction is reduced to addition of long numbers: R i+1 =(R i + a i B + q i N)/2 3. use systolic array or redundant representation to avoid long carry chains 4. further reading: [Eldridge/Walter 93] for general HW, [Blum 99] for FPGA

Mersenne Prime Fields GF (2 n 1) Idea: Reduce modular reduction to addition. Central relation: 2 n 1modp Algorithm: let A, B GF(2 n 1) A B = c h 2 n + c l where c h,c l 2 n 1 A B c h + c l mod p Complexity: Modular reduction requires 1 add (as opposed to (n/w) 2 mult in the case of general primes). Remarks: Modular mult complexity is (n/w) 2 inner products Roughly twice as fast as mult with general prime. GF(2 n c), c small, was proposed for ECC in [Crandall 92]

Generalized Mersenne Prime Fields see [NIST 99] Idea: Generalize modulo reduction \trick" from 2 n 1 to primes p =2 n lw ± 2 n l 1w ± ±2 n 1w ± 1 where n l >n l 1 > >n 1 > 0 and w =2 i, often i =16, 32, 64. Let A, B GF(p), and write A B as: A B = c 2nl 12 (2n l 1)w + c 2nl 22 (2n l 2)w + + c 1 2 w + c 0 Coecients c i 2 iw, i>n l, can be reduced recursively: For instance: 2 n lw 2 n l 1w 2 n 1w 1modp 2 (2n l 1)w 2 (n l+n l 1 1)w 2 (n l+n 1 1)w 1modp

Gener. Mersenne Primes: Example p =2 192 2 64 1=2 3 64 2 64 1, w =64 A B = c 5 2 320 + c 4 2 256 + c 3 2 192 + c 2 2 128 + c 1 2 64 + c 0 Reduction equations: 2 320 2 192 + 2 128 modp 2 256 2 128 + 2 64 modp 2 192 2 64 + 1 modp A B c 4 2 256 +[c 5 + c 3 ]2 192 +[c 5 + c 2 ]2 128 + c 1 2 64 +c 0 mod p A B [c 5 + c 3 ]2 192 +[c 5 + c 4 + c 2 ]2 128 +[c 4 + c 1 ]2 64 +c 0 mod p A B [c 5 + c 4 + c 2 ]2 128 +[c 5 + c 4 + c 3 + c 1 ]2 64 +[c 5 + c 3 + c 0 ]modp Reduction requires no multiplication Modular mult complexity is (n/w) 2 inner products Roughly twice as fast as mult with general primes Specic primes are recommended by NIST for ECC

Extension Fields GF (2 m ) applicable to DL and ECC extremely well studied (compared to other characteristics) since 1960s due to applications in coding choice of char = 2 was traditionally driven by hardware implementations arithmetic is greatly inuenced by choice of basis bases proposed for applications: 1. standard (or polynomial) basis 2. normal basis 3. other (dual basis, triangular basis,...) here: focus on polynomial basis.

GF (2 m ) Multiplication in Hardware active research area, many proposed architectures classication according to time-area trade-o arch. type m #clocks #gates Remarks (time) (area) bit parallel any 1 O(m 2 ) often \too big" digit serial any m/d O(mD) D<m hybrid D m m/d O(mD) D<m bit serial any m O(m) classical arch. super serial any ms O(m/s) new, mainly for FPGA [O/P 99] main relevance for cryptography: bit serial, digit serial, and hybrid multipliers

Bit Serial Multiplication Standard basis GF multiplication: A B = (a m 1 x m 1 + a 1 x + a 0 ) (b m 1 x m 1 + b 1 x + b 0 )modp(x) where a i,b i GF(2). Often: P(x) is trinomial or pentanomial Two traditional architectures least signicant bit-rst (LSB) multiplier most signicant bit-rst (MSB) multiplier (see, e.g., [Beth/Gollmann 89])

Least Signicant Bit-First Architecture A B = a 0 B(x) + a 1 [xb(x) modp(x)] + + a m 1 [x(x m 2 B(x)) mod P(x)] Architecture if P(x) is trinomial: b0 b 1 b t b m-2 bm-1 a0, a 1, a 2 c 0 c 1 c t c m-2 c m-1 In every clock cycle compute: 1. mult by x and mod red.: x (x i 1 B(x)) mod P(x) 2. scalar mult by a i and add: + a i [x i B(x)] time complexity: m clock cycles area complexity: cm gates, c small

Hybrid Multipliers work for composite elds GF((2 n ) m ) (see [P/S 97]) total extension degree (nm) can't be prime trades space for speed (faster but larger than LSB) least signicant and most signicant architectures are possible architectures analogous to bit serial mult (LSB, MSB) fundamental idea: process n subeld bits in parallel Recall: Element representation in binary elds A GF(2 nm ) A(x) =a nm 1 x nm 1 + + a 1 x + a 0, a i GF(2) Element representation in composite elds A GF((2 n ) m ) A(x) =a m 1 x m 1 + + a 1 x + a 0, a i GF(2 n )

A B = a 0 B(x) + a 1 [xb(x) modp(x)] + + a m 1 [x(x m 2 B(x)) mod P(x)] Architecture if P(x) is trinomial: p 0 p t n n b0 b1 bt n b m-1 n n a 0, a 1 n n c 0 c 1 c t c m-1 - gate costs occur in GF(2 n ) bit parallel multipliers - area compl.: mn 2 AND + mn 2 XOR - time compl.: m n times faster than LSB

Digit Multipliers relatively new [Song/Parhi 96] trades space for speed (faster but larger than LSB) time and area complexity similar to hybrid multipliers works for any m LSD and MSD are possible fundamental idea: Process D>1 bit at a time.

Least Signicant Digit Architecture 1. Step: Break A(x) down into s digit polynomials, where s = m/d. where A(x) = a m 1 x m 1 + + a 1 + a 0, a i GF(2) A(x) = ~a s 1 (x) x (s 1)D + +~a 1 (x) x D +~a 0 (x) ~a i (x) =a i,d 1 x D 1 + + a i,1 x + a i,0, a i,j GF(2) 2. Step: Digit wise multiplication AB = ~a 0 (x)b(x) modp(x) + ~a 1 (x)[(x D B(x)) mod P(x)] mod P(x) + ~a 2 (x)[x D (x D B(x)) mod P(x)] mod P(x)+ + ~a s 1 (x)[x D (x D(s 2) B(x)) mod P(x)] mod P(x) Operations per clock cycle: 1. multiplication by x D and modular reduction: x D [x (i 1)D B(x) modp(x)] 2. bit parallel multiplication of D m bit polynomials: ~a i (x) [x id B(x) modp(x)]

2. Step: AB = ~a 0 (x)b(x) modp(x) + ~a 1 (x)[(x D B(x)) mod P(x)] mod P(x) + + ~a s 1 (x)[x D (x D(s 2) B(x)) mod P] modp B m a~ a ~ 1 0 D D X mod P m D x m bit mult Accu A B m - mult by x D is mainly a bit permutation - gate costs occur in D m bit parallel mult - area compl.: md AND + md XOR - time compl.: m/d D times faster than LSB

Optimal Extension Fields GF (p m ) relatively new (see [B/P 98]) main applications in ECC small extension degrees of m 3...8are common very fast arithmetic on 64 bit processors

Optimal Extension Fields GF (p m ) Idea: Fully exploit the fast integer arithmetic available in modern microprocessors Design Principles 1. Choose subeld GF(p) to be close to the processor's word size fast subeld multiplication 2. Choose subeld GF(p) to be a pseudo-mersenne prime, that is, p = 2 n ± c, for \small" c fast subeld modular reduction 3. Choose m so that an irreducible binomial P(x) =x m ω exists fast extension eld modular reduction

Subeld Multiplication: a i b j mod p Note: Subeld mult is time critical operation Important: p =2 n c, where c 2 n/2. 2 n c (mod (2 n c)) a i n bits b j c a i b j h l 2n-1 n n-1 0 h, l 2 n 1 a i b j = 2 n h + l a i b j ch + l mod p

Subeld Multiplication: a i b j mod p n/2 bits n bits l c * h h l a i b j ch + l mod p =2 n h + l ch + l mod p l c * h c * h + l n+1 0 Subeld mult complexity: 3 mults by c + adds, shifts OEF mult complexity: 3(m 2 + m 1) int mult (very low for small m) Rem: Major speed-up if c = 1, i.e., p is Mersenne prime

Some Research Problems Fast Galois eld arithmetic in software for general eld polynomials? Hardware arithmetic architectures for some \new" eld types, such as generalized Mersenne prime elds and OEFs? Other GF(2 m ) bases which lead to faster arithmetic? Thorough comparison of standard basis vs. normal basis vs...., especially in software? Faster inversion in GF(p)?

References [1] D. Bailey and C. Paar. Optimal extension elds for fast arithmetic in public-key algorithms. In H. Krawczyk, editor, Advances in Cryptography CRYPTO '98, volume LNCS 1462, pages 472{485. Springer-Verlag, 1998. [2] T. Beth and D. Gollmann. Algorithm engineering for public key algorithms. IEEE Journal on Selected Areas in Communications, 7(4):458{466, 1989. [3] T. Blum. Modular exponentiation on recongurable hardware. Master's thesis, ECE Dept., Worcester Polytechnic Institute, Worcester, USA, May 1999. [4] R. Crandall. Method and apparatus for public key exchange in a cryptographic system. United States Patent, Patent Number 5159632, October 27 1992. [5] S. E. Eldridge and C. D. Walter. Hardware implementation of Montgomery's modular multiplication algorithm. IEEE Transactions on Computers, 42(6):693{ 699, July 1993. [6] C. Koc, T. Acar, and B. Kaliski. Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro, 16:26{33, 1996.

[7] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1997. [8] D. Naccache and D. M'Rahi. Cryptographic smart cards. IEEE Micro, 16:14{23, 1996. [9] National Institute of Standard and Technology. Recommended elliptic curves for federal government use. available at http://csrc.nist.gov/encryption, May 1999. [10] G. Orlando and C. Paar. A super-serial Galois elds multiplier for FPGAs and its application to publickey algorithms. In Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM '99, Napa Valley, USA, April 12{23 1997. [11] C. Paar and P. Soria Rodriguez. Fast arithmetic architectures for public-key algorithms over Galois elds GF((2 n ) m ). In W. Fumy, editor, Advances in Cryptography EUROCRYPT '97, volume LNCS 1233, pages 363{378. Springer-Verlag, 1997. [12] L. Song and K. K. Parhi. Low energy digit-serial/ parallel nite eld multipliers. Journal of VLSI Signal Processing, 19(2):149{166, June 1998.