ABHELSINKI UNIVERSITY OF TECHNOLOGY

Similar documents
Another Look at Inversions over Binary Fields

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m )

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs

FPGA Implementation of Point Multiplication on Koblitz Curves Using Kleinian Integers

Scalar Multiplication on Koblitz Curves using

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic operators for pairing-based cryptography

International Journal of Advanced Computer Technology (IJACT)

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

EECS150 - Digital Design Lecture 21 - Design Blocks

Parallel Itoh-Tsujii Multiplicative Inversion Algorithm for a Special Class of Trinomials

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster

Software implementation of Koblitz curves over quadratic fields

Efficient random number generation on FPGA-s

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Optimizing scalar multiplication for koblitz curves using hybrid FPGAs

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Efficient Hardware Calculation of Inverses in GF (2 8 )

FPGA Implementation of Point Multiplication on Koblitz Curves Using Kleinian Integers

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

Cyclic Codes. Saravanan Vijayakumaran August 26, Department of Electrical Engineering Indian Institute of Technology Bombay

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

GF(2 m ) arithmetic: summary

Optimal Extension Field Inversion in the Frequency Domain

Elliptic Curve Group Core Specification. Author: Homer Hsing

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes

Parallel Formulations of Scalar Multiplication on Koblitz Curves

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

High Performance GHASH Function for Long Messages

Hardware Architectures of Elliptic Curve Based Cryptosystems over Binary Fields

Arithmetic Operators for Pairing-Based Cryptography

ECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders

New Bit-Level Serial GF (2 m ) Multiplication Using Polynomial Basis

Side Channel Analysis and Protection for McEliece Implementations

Hardware Acceleration of the Tate Pairing in Characteristic Three

Australian Journal of Basic and Applied Sciences

Fast point multiplication algorithms for binary elliptic curves with and without precomputation

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

CSCE 564, Fall 2001 Notes 6 Page 1 13 Random Numbers The great metaphysical truth in the generation of random numbers is this: If you want a function

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

FPGA-based Key Generator for the Niederreiter Cryptosystem using Binary Goppa Codes

Tate Bilinear Pairing Core Specification. Author: Homer Hsing

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

2. Accelerated Computations

A Novel Efficient Hardware Implementation of Elliptic Curve Cryptography Scalar Multiplication using Vedic Multiplier

Pipelined Viterbi Decoder Using FPGA

FPGA Implementation of a Predictive Controller

Arithmetic Operators for Pairing-Based Cryptography

Hardware Implementation of Elliptic Curve Cryptography over Binary Field

Ch 9. Sequential Logic Technologies. IX - Sequential Logic Technology Contemporary Logic Design 1

Hardware implementations of ECC

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Four-Dimensional GLV Scalar Multiplication

Elliptic Curves I. The first three sections introduce and explain the properties of elliptic curves.

Novel Implementation of Finite Field Multipliers over GF(2m) for Emerging Cryptographic Applications

An Efficient Multiplier/Divider Design for Elliptic Curve Cryptosystem over GF(2 m ) *

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

HARDWARE REALIZATION OF HIGH SPEED ELLIPTIC CURVE POINT MULTIPLICATION USING PRECOMPUTATION OVER GF(p)

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields

Discrete Logarithm Problem

Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves

On Hardware Implementation of Tang-Maitra Boolean Functions

8 Elliptic Curve Cryptography


Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field

Homework 8 Solutions to Selected Problems

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

High Performance GHASH Function for Long Messages

Digital Electronics Sequential Logic

Constrained Clock Shifting for Field Programmable Gate Arrays

EECS Components and Design Techniques for Digital Systems. Lec 26 CRCs, LFSRs (and a little power)

Montgomery Multiplier and Squarer in GF(2 m )

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago

Design of Low Power Optimized MixColumn/Inverse MixColumn Architecture for AES

Représentation RNS des nombres et calcul de couplages

Highly Efficient GF(2 8 ) Inversion Circuit Based on Redundant GF Arithmetic and Its Application to AES Design

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

A field F is a set of numbers that includes the two numbers 0 and 1 and satisfies the properties:

Faster ECC over F 2. (feat. PMULL)

9.1. Unit 9. Implementing Combinational Functions with Karnaugh Maps or Memories

New Implementations of the WG Stream Cipher

Lessons Learned from High-Speed Implementa6on and Benchmarking of Two Post-Quantum Public-Key Cryptosystems

Design at the Register Transfer Level

Low-Resource and Fast Elliptic Curve Implementations over Binary Edwards Curves

Reducing the Complexity of Normal Basis Multiplication

Digital Logic Design - Chapter 4

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

Transcription:

On Repeated Squarings in Binary Fields Kimmo Järvinen Helsinki University of Technology August 14, 2009 K. Järvinen On Repeated Squarings in Binary Fields 1/1

Introduction Repeated squaring Repeated squaring: a 2e (x) where a(x) F 2 m with polynomial basis Applications in elliptic curve cryptography (e.g., inversions in the field and scalar multiplications on Koblitz curves) Field-programmable gate arrays (FPGAs) Popular implementation platforms for cryptography Existing repeated squarers iterate squaring for e times How to implement efficient repeated squarers with FPGAs? K. Järvinen On Repeated Squarings in Binary Fields 2/1

Introduction Repeated squaring Repeated squaring: a 2e (x) where a(x) F 2 m with polynomial basis Applications in elliptic curve cryptography (e.g., inversions in the field and scalar multiplications on Koblitz curves) Field-programmable gate arrays (FPGAs) Popular implementation platforms for cryptography Existing repeated squarers iterate squaring for e times How to implement efficient repeated squarers with FPGAs? K. Järvinen On Repeated Squarings in Binary Fields 2/1

Repeated squaring in binary fields Squaring is a 2 (x) = m 1 i=0 a ix 2i mod p(x) where a i {0, 1} and p(x) is an irreducible polynomial A linear transformation described by Qa where a = [a 0 a 1... a m 1 ] T and 1 q 0,1 q 0,2 q 0,m 1 0 q 1,1 q 1,2 q 1,m 1 Q =....... 0 q m 1,1 q m 1,2 q m 1,m 1 A repeated squaring is given by Q e a K. Järvinen On Repeated Squarings in Binary Fields 3/1

Repeated squaring in binary fields Squaring is a 2 (x) = m 1 i=0 a ix 2i mod p(x) where a i {0, 1} and p(x) is an irreducible polynomial A linear transformation described by Qa where a = [a 0 a 1... a m 1 ] T and 1 q 0,1 q 0,2 q 0,m 1 0 q 1,1 q 1,2 q 1,m 1 Q =....... 0 q m 1,1 q m 1,2 q m 1,m 1 A repeated squaring is given by Q e a K. Järvinen On Repeated Squarings in Binary Fields 3/1

Look-up tables (LUTs) Basic building block of an FPGA is an n-to-1 bit look-up table (n-lut) Typically, n = 4 but contemporary FPGAs have larger n, e.g., n = 6 (Xilinx Virtex-5) or n = 7 (Altera Stratix-II, and beyond) Notice that using only two inputs of an n-lut costs as much as using all of its inputs K. Järvinen On Repeated Squarings in Binary Fields 4/1

Definitions Definition (Weight and row-weight) Weight, W(Q e ), is the number of ones in Q e ; and Row-weight, W i (Q e ), is the number of ones on the i th row of Q e Definition (Area) Area, A(Q e ), is the number of n-luts required to implement Q e Definition (Critical path) Critical path, D(Q e ), is the length of the longest path of consecutive n-luts in the circuit computing Q e K. Järvinen On Repeated Squarings in Binary Fields 5/1

Weights of the NIST fields K. Järvinen On Repeated Squarings in Binary Fields 6/1

Area and delay Area It is possible to implement Q e with a circuit whose area A n (Q e ) satisfies m A n (Q e Wi (Q e ) 1 ) n 1 Delay i=1 Critical path, D n (Q e ), is bounded by D n (Q e ) max log n W i (Q e ) i K. Järvinen On Repeated Squarings in Binary Fields 7/1

Area and delay Area It is possible to implement Q e with a circuit whose area A n (Q e ) satisfies m A n (Q e Wi (Q e ) 1 ) n 1 Delay i=1 Critical path, D n (Q e ), is bounded by D n (Q e ) max log n W i (Q e ) i K. Järvinen On Repeated Squarings in Binary Fields 7/1

Example Consider computing a 22 (x) in F 2 [x]/x 4 + x + 1. We have 1 1 1 1 Q 2 = 0 1 0 1 0 0 1 1. 0 0 0 1 Weights: W(Q 2 ) = 9 and W 1 (Q 2 ) = 4, W 2 (Q 2 ) = 2, W 3 (Q 2 ) = 2, and W 4 (Q 2 ) = 1. Area: if n = 2, we get A 2 (Q 2 ) 5 (minimum A 2 (Q 2 ) = 4). If n = 4, we get A 4 (Q 2 ) = 3 Delay: if n = 2, we get D 2 (Q 2 ) = 2 and D 4 (Q 2 ) = 1. K. Järvinen On Repeated Squarings in Binary Fields 8/1

Example Consider computing a 22 (x) in F 2 [x]/x 4 + x + 1. We have 1 1 1 1 Q 2 = 0 1 0 1 0 0 1 1. 0 0 0 1 Weights: W(Q 2 ) = 9 and W 1 (Q 2 ) = 4, W 2 (Q 2 ) = 2, W 3 (Q 2 ) = 2, and W 4 (Q 2 ) = 1. Area: if n = 2, we get A 2 (Q 2 ) 5 (minimum A 2 (Q 2 ) = 4). If n = 4, we get A 4 (Q 2 ) = 3 Delay: if n = 2, we get D 2 (Q 2 ) = 2 and D 4 (Q 2 ) = 1. K. Järvinen On Repeated Squarings in Binary Fields 8/1

Example Consider computing a 22 (x) in F 2 [x]/x 4 + x + 1. We have 1 1 1 1 Q 2 = 0 1 0 1 0 0 1 1. 0 0 0 1 Weights: W(Q 2 ) = 9 and W 1 (Q 2 ) = 4, W 2 (Q 2 ) = 2, W 3 (Q 2 ) = 2, and W 4 (Q 2 ) = 1. Area: if n = 2, we get A 2 (Q 2 ) 5 (minimum A 2 (Q 2 ) = 4). If n = 4, we get A 4 (Q 2 ) = 3 Delay: if n = 2, we get D 2 (Q 2 ) = 2 and D 4 (Q 2 ) = 1. K. Järvinen On Repeated Squarings in Binary Fields 8/1

Example Table: Areas and delays for NIST F 2 233 with different n A n (Q e ) D n (Q e ) n 2 3 4 5 6 7 2 3 4 5 6 7 e = 1 153 153 153 153 153 153 1 1 1 1 1 1 e = 2 361 245 230 230 230 230 2 2 2 1 1 1 e = 3 676 385 349 238 233 233 3 2 2 2 1 1 e = 4 1141 616 466 358 349 291 4 3 2 2 2 2 e = 5 1844 973 699 550 466 396 4 3 2 2 2 2 e = 6 2892 1511 1035 812 663 580 5 3 3 2 2 2 K. Järvinen On Repeated Squarings in Binary Fields 9/1

K. Järvinen On Repeated Squarings in Binary Fields 10/1

Implementation: Idea Rather than iterating a squarer for e clock cycles, computing Q e directly, or using unrolled squarers (Q Q... Q, e times) we search a concatenation Q e 1 Q e 2... Q e N with e = N i=1 e i minimizing the metric under optimization (area, delay, etc.) Example If e = 9 and n = 6, the setup minimizing area is Q 3 Q 3 Q 3 which has an area estimate of 699 LUTs and a critical path of 3 LUTs. (Iterative: 153 LUTs + 233 regs / 9 cycles, Direct: 1944 / 3 LUTs, Square chain: 1377 / 9 LUTs) K. Järvinen On Repeated Squarings in Binary Fields 11/1

Implementation: Idea Rather than iterating a squarer for e clock cycles, computing Q e directly, or using unrolled squarers (Q Q... Q, e times) we search a concatenation Q e 1 Q e 2... Q e N with e = N i=1 e i minimizing the metric under optimization (area, delay, etc.) Example If e = 9 and n = 6, the setup minimizing area is Q 3 Q 3 Q 3 which has an area estimate of 699 LUTs and a critical path of 3 LUTs. (Iterative: 153 LUTs + 233 regs / 9 cycles, Direct: 1944 / 3 LUTs, Square chain: 1377 / 9 LUTs) K. Järvinen On Repeated Squarings in Binary Fields 11/1

Implementation: Idea Rather than iterating a squarer for e clock cycles, computing Q e directly, or using unrolled squarers (Q Q... Q, e times) we search a concatenation Q e 1 Q e 2... Q e N with e = N i=1 e i minimizing the metric under optimization (area, delay, etc.) Example If e = 9 and n = 6, the setup minimizing area is Q 3 Q 3 Q 3 which has an area estimate of 699 LUTs and a critical path of 3 LUTs. (Iterative: 153 LUTs + 233 regs / 9 cycles, Direct: 1944 / 3 LUTs, Square chain: 1377 / 9 LUTs) K. Järvinen On Repeated Squarings in Binary Fields 11/1

Implementation: Idea Rather than iterating a squarer for e clock cycles, computing Q e directly, or using unrolled squarers (Q Q... Q, e times) we search a concatenation Q e 1 Q e 2... Q e N with e = N i=1 e i minimizing the metric under optimization (area, delay, etc.) Example If e = 9 and n = 6, the setup minimizing area is Q 3 Q 3 Q 3 which has an area estimate of 699 LUTs and a critical path of 3 LUTs. (Iterative: 153 LUTs + 233 regs / 9 cycles, Direct: 1944 / 3 LUTs, Square chain: 1377 / 9 LUTs) K. Järvinen On Repeated Squarings in Binary Fields 11/1

Implementation: Varying exponent Solution 1 (Distinct exponents, {e 1,..., e l }) Let i = e i e i 1 Find the optimal circuits for each i and concatenate them Select results using a multiplexer Example If E = {1, 2, 4, 8, 16} and n = 6, we get the repeated squarer shown below with an area estimate of 1600 LUTs. K. Järvinen On Repeated Squarings in Binary Fields 12/1

Implementation: Varying exponent Solution 2 (Range, 0 e e max) Let e opt be the exponent that minimizes A n (Qê)/ê Concatenate e max /e opt Q eopt blocks Compute the remaining squarings with a square chain Example If 0 e 14 and n = 6, we get the repeated squarer shown below with an area estimate of 1238 LUTs. K. Järvinen On Repeated Squarings in Binary Fields 13/1

Results Several repeated squarers were synthesized for Spartan-3A and Virtex-5 FPGAs (see the paper) The results show that repeated squarers are small and fast enough to be included in existing finite field processors Example (NIST F 2 233, Virtex-5) Solution 1 with {1, 2, 4, 8, 16}: area 1823 LUTs and delay 8.23 ns Solution 2 with 0 e 11: area 1809 LUTs and delay 8.10 ns K. Järvinen On Repeated Squarings in Binary Fields 14/1

Results Several repeated squarers were synthesized for Spartan-3A and Virtex-5 FPGAs (see the paper) The results show that repeated squarers are small and fast enough to be included in existing finite field processors Example (NIST F 2 233, Virtex-5) Solution 1 with {1, 2, 4, 8, 16}: area 1823 LUTs and delay 8.23 ns Solution 2 with 0 e 11: area 1809 LUTs and delay 8.10 ns K. Järvinen On Repeated Squarings in Binary Fields 14/1

Inversions in binary fields Fermat s Little Theorem a 1 (x) = a 2m 2 (x) Computed with a series of multiplications and (repeated) squarings Itoh and Tsujii: log 2 (m 1) + w(m 1) 1 multiplications and m 1 squarings Example (Inversion in F 2 233) Computed with 10 multiplications and 232 squarings A repeated squarer (solution 1) with e {1, 2, 4, 8, 16} gives the following speedups with different multiplier latencies: M = 18 52 %, M = 6 73 %, and M = 1 88 % (19 repeated squarings instead of 232 squarings) K. Järvinen On Repeated Squarings in Binary Fields 15/1

Inversions in binary fields Fermat s Little Theorem a 1 (x) = a 2m 2 (x) Computed with a series of multiplications and (repeated) squarings Itoh and Tsujii: log 2 (m 1) + w(m 1) 1 multiplications and m 1 squarings Example (Inversion in F 2 233) Computed with 10 multiplications and 232 squarings A repeated squarer (solution 1) with e {1, 2, 4, 8, 16} gives the following speedups with different multiplier latencies: M = 18 52 %, M = 6 73 %, and M = 1 88 % (19 repeated squarings instead of 232 squarings) K. Järvinen On Repeated Squarings in Binary Fields 15/1

Scalar multiplication on Koblitz curves Scalar multiplication on Koblitz curves, kp where k = l 1 i=0 k iτ i, computed with the binary algorithm: w(k) point additions and l 1 Frobenius maps Frobenius map: (x, y) (x 2, y 2 ) e successive Frobenius maps can be computed with two repeated squarings: (x 2e, y 2e ) Example (Scalar multiplication on NIST K-233) k given in width-2 τnaf w(k) m/3 Point addition takes 8M + 13 clock cycles (based on existing work) and we have a repeated squarer (solution 2) with 0 e 11: Speedups: M = 17 3.8 %, M = 8 7.0 %, and M = 5 9.7 % K. Järvinen On Repeated Squarings in Binary Fields 16/1

Scalar multiplication on Koblitz curves Scalar multiplication on Koblitz curves, kp where k = l 1 i=0 k iτ i, computed with the binary algorithm: w(k) point additions and l 1 Frobenius maps Frobenius map: (x, y) (x 2, y 2 ) e successive Frobenius maps can be computed with two repeated squarings: (x 2e, y 2e ) Example (Scalar multiplication on NIST K-233) k given in width-2 τnaf w(k) m/3 Point addition takes 8M + 13 clock cycles (based on existing work) and we have a repeated squarer (solution 2) with 0 e 11: Speedups: M = 17 3.8 %, M = 8 7.0 %, and M = 5 9.7 % K. Järvinen On Repeated Squarings in Binary Fields 16/1

Side-channel resistivity Problem Computing e Frobenius maps takes 2e clock cycles which can be distinguished simply by counting clock cycles from the power trace (confer, weaknesses of the normal binary algorithm). Leaks the positions of nonzero k i Solution Use repeated squarers the attacker sees only a series of point additions and (two) repeated squarings the attacker must be able to disinguish e from the power trace of the repeated squarer (one clock cycle) K. Järvinen On Repeated Squarings in Binary Fields 17/1

Side-channel resistivity Problem Computing e Frobenius maps takes 2e clock cycles which can be distinguished simply by counting clock cycles from the power trace (confer, weaknesses of the normal binary algorithm). Leaks the positions of nonzero k i Solution Use repeated squarers the attacker sees only a series of point additions and (two) repeated squarings the attacker must be able to disinguish e from the power trace of the repeated squarer (one clock cycle) K. Järvinen On Repeated Squarings in Binary Fields 17/1

Summary A new component called repeated squarer computing a 2e (x) directly in one clock cycle was introduced Small and fast enough to be used in existing finite field processors on FPGAs Improves the speed of inversion and scalar multiplication on Koblitz curves Enhances resistivity against side-channel attacks K. Järvinen On Repeated Squarings in Binary Fields 18/1

Thank you. Questions? K. Järvinen On Repeated Squarings in Binary Fields 19/1