Hardware Acceleration of the Tate Pairing in Characteristic Three

Similar documents
Arithmetic Operators for Pairing-Based Cryptography

Hardware Acceleration of the Tate Pairing in Characteristic Three

Arithmetic operators for pairing-based cryptography

Arithmetic Operators for Pairing-Based Cryptography

Some Efficient Algorithms for the Final Exponentiation of η T Pairing

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

Montgomery Algorithm for Modular Multiplication with Systolic Architecture

Efficient Implementation of Cryptographic pairings. Mike Scott Dublin City University

Tampering attacks in pairing-based cryptography. Johannes Blömer University of Paderborn September 22, 2014

An Algorithm for the η T Pairing Calculation in Characteristic Three and its Hardware Implementation

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

Tate Bilinear Pairing Core Specification. Author: Homer Hsing

Efficient random number generation on FPGA-s

FPGA-based Niederreiter Cryptosystem using Binary Goppa Codes

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Efficient Implementation of Cryptographic pairings. Mike Scott Dublin City University

Optimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Arithmetic Operators for Pairing-Based Cryptography

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Elliptic Curve Cryptography and Security of Embedded Devices

Représentation RNS des nombres et calcul de couplages

Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

CprE 281: Digital Logic

Compact Ring LWE Cryptoprocessor

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

CSC 774 Advanced Network Security

CSC 774 Advanced Network Security

EECS150 - Digital Design Lecture 21 - Design Blocks

Fixed Argument Pairings

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

CMP 334: Seventh Class

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

ABHELSINKI UNIVERSITY OF TECHNOLOGY

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields

Side Channel Analysis and Protection for McEliece Implementations

An Introduction to Pairings in Cryptography

High Speed Cryptoprocessor for η T Pairing on 128-bit Secure Supersingular Elliptic Curves over Characteristic Two Fields

ECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Cost/Performance Tradeoffs:

Efficient Hardware Architecture for Scalar Multiplications on Elliptic Curves over Prime Field

2. Accelerated Computations

Hardware Architectures of Elliptic Curve Based Cryptosystems over Binary Fields

Hidden pairings and trapdoor DDH groups. Alexander W. Dent Joint work with Steven D. Galbraith

A Dierential Power Analysis attack against the Miller's Algorithm

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

ECE/CS 250 Computer Architecture

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

標数 3 の超特異楕円曲線上の ηt ペアリングの高速実装

Adders, subtractors comparators, multipliers and other ALU elements

High Performance GHASH Function for Long Messages

GF(2 m ) arithmetic: summary

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols

Implementing the Elliptic Curve Method of Factoring in Reconfigurable Hardware

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

ABHELSINKI UNIVERSITY OF TECHNOLOGY

One can use elliptic curves to factor integers, although probably not RSA moduli.

Combinational Logic. By : Ali Mustafa

FPGA Implementation of a Predictive Controller

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

SM9 identity-based cryptographic algorithms Part 1: General

The Hash Function JH 1

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

An Optimized Hardware Architecture of Montgomery Multiplication Algorithm

Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves

6. ELLIPTIC CURVE CRYPTOGRAPHY (ECC)

Number Systems III MA1S1. Tristan McLoughlin. December 4, 2013

Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator

Accelerating AES Using Instruction Set Extensions for Elliptic Curve Cryptography. Stefan Tillich, Johann Großschädl

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol

Latches. October 13, 2003 Latches 1

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR

Public Key Encryption with Conjunctive Field Keyword Search

Introduction to CMOS VLSI Design Lecture 1: Introduction

Constructing Abelian Varieties for Pairing-Based Cryptography

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

Adders, subtractors comparators, multipliers and other ALU elements

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Lecture 8: Sequential Multipliers

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

CSCE 564, Fall 2001 Notes 6 Page 1 13 Random Numbers The great metaphysical truth in the generation of random numbers is this: If you want a function

EECS 579: Logic and Fault Simulation. Simulation

ECE/CS 250: Computer Architecture. Basics of Logic Design: Boolean Algebra, Logic Gates. Benjamin Lee

EECS150. Arithmetic Circuits

Optimised versions of the Ate and Twisted Ate Pairings

Hardware Operator for Simultaneous Sine and Cosine Evaluation

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

You separate binary numbers into columns in a similar fashion. 2 5 = 32

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

convert a two s complement number back into a recognizable magnitude.

High-Performance FV Somewhat Homomorphic Encryption on GPUs: An Implementation using CUDA

Cosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks

CPE 776:DATA SECURITY & CRYPTOGRAPHY. Some Number Theory and Classical Crypto Systems

Transcription:

Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1

Introduction Pairing based cryptography is a (fairly) new area: Has provided new instantiations of Identity Based Encryption. Has provided a wealth of new hard problems and proof techniques. Has opened a new area for those interested in implementation. So far, most implementations have been done in software; our main aims before we started were: Compare hardware polynomial and normal basis arithmetic in the finite fields F 3 97 and F 3 89 respectivley. Ideally we wanted same field size but curve selection and FPGA size bit us. Evaluate the size and performance of a flexible pairing accelerator for use in constrained environments. Ignore the fact that η-pairings, MNT curves and Fp arithmetic might be a more modern and better way to go :-) Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 2

Pairing Based Cryptography (1) For our purposes, the pairing is just a map between groups: e : G 1 G 1 G 2 where we usually set G 1 = E(F q ) and G 2 = F q k. The main interesting property of the map is termed bilinearity: e(a P, b Q) = e(p, Q) a b which means we can play about with the exponents at will. To work in a useful way, the map also needs to be: Non-degenerate, i.e. not all e(p, Q) = 1. Computable, i.e. we can evaluate e(p, Q) easily. In real applications we generally use the Tate or Weil pairing. Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 4

Pairing Based Cryptography (2) Such pairings were originally thought to only be useful in a destructive setting. Boneh-Franklin identity based encryption is perhaps the most interesting constructive use: The trust authority or TA has a public key PTA = s P for a public value P and secret value s. A users public key is calculated from the string ID using a hash function as P ID = H 1 (ID). A users secret key is calculated by the TA as S ID = s P ID. To encrypt M, select a random r and compute the tuple: C = (r P, M H 2 (e(p ID, P TA ) r )). To decrypt C = (U, V ), we compute the result: M = V H 2 (e(s ID, U)). Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 6

Pairing Based Cryptography (3) We are interested in the case where q = 3 m and k = 6 since this is attractive from a parameterisation perspective. Along with the standard Miller-style BKLS algorithm, there are two closed-form algorithms in this case. Both compute e(p, Q) with P = (x 1, y 1 ) and Q = (x 2, y 2 ). The Duursma-Lee Algorithm The Kwon-BGOS Algorithm f 1 for i = 1 upto m do x 1 x 3 1 y 1 y 3 1 µ x 1 + x 2 + b λ y 1 y 2 σ µ 2 g λ µρ ρ 2 f f g x 2 x 1/3 2 y 2 y 1/3 2 return f q3 1 f 1 x 2 x2 3 y 2 y2 3 d mb for i = 1 upto m do x 1 x1 9 y 1 y1 9 µ x 1 + x 2 + d λ y 1 y 2 σ µ 2 g λ µρ ρ 2 f f 3 g y 2 y 2 d d b return f q3 1 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 8

Hardware Implementation (1) We need quite a few different operations: E(Fq ): Addition, Tripling, Scalar Multiplication. Fq : Addition, Multiplication, Inversion, Cube, Cube Root. Fq k : Addition, Multiplication, Inversion, Cube. Everything depends on high-performance F q arithmetic. We approach is to implement only F q arithmetic in hardware. One can obviously get some different results by exploiting the parallelism in F q k or by building a dedicated pairing circuit. Point Addition Point Tripling Pairing F q k F q Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 10

Hardware Implementation (2) In either basis, our field elements are polynomials with coefficients in F 3. We take the now conventional approach of representing the i-th coefficient a i as two bits: ai L = a i mod 2 ai H = a i div 2 and then constructing basic arithmetic cells using a fairly low-cost arrangement of logic gates: ADD MUL a b a L b L a L b H a H b L a H b H r H r L r a b a L b H a H b L a L b L a H b H r H r L r Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 12

0 0 0 1 Hardware Implementation (4) In a polynomial basis, the multiplication c = a b is performed by normal polynomial multiplication and reduction. We use a digit-wise rather than bit-wise multiplier design: '( " 0 0 ) *+'",.-/ 0 0! "# "%$&" We were able to fit a digit-size of 4 onto our experimental platform. Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 14

Hardware Implementation (4) In a normal basis, the multiplication c = a b is performed according to the formula: c k = m 1 i=0 m 1 a k+i j=0 M i,j b k+j The matrix M essentially determines how reduction works, it is very sparse so the whole operation is fairly efficient. The structure of the multiplier allows a similar digit-wise approach, we used a digit-size of 2:! "$#%'&)(+*,%.- 0/,1 &23 154 (+*,%.- 6$6 /,78/"1 %'&)(+*,%.- Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 16

Hardware Implementation (4) In a polynomial basis, cubing can be calculated in a similar way to squaring in characteristic two: That is, we expand the element using the identity: (a i x i ) 3 = a 3 i x 3i = a i x 3i Because of reduction, the cube operation dominates critial path of design since unreduced element is large. Cube root can be calculated using the method of Barreto, for our field u = 32 and v = 5: t 0 = u i=0 a 3ix i t 1 = u 1 i=0 a 3i+1x i t 2 = u 1 i=0 a 3i+2x 3 i a = t0 + t 2u+1 1 t u+v+1 1 + t 2v+2 1 2t u+1 2 2t v+1 2 which turns out to be quite efficient. Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 18

Hardware Implementation (5) In a normal basis, cube and cube root are just cyclic shifts of the coefficients: a 3 = (a m 1, a 0,..., a m 3, a m 2 ), 3 a = (a1, a 2,..., a m 1, a 0 ). This was the whole point of investigating their use: Cube is used extensively throughout point and pairing arithmetic. Efficient cube root it vital for Duursma-Lee algorithm. Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 20

Hardware Implementation (6) Inversion was always going to be unpleasant: Fortunately we only need it once to perform the final powering which computes the result f q3 1. This computation is decomposed into f 3 3m f 1. The field representation means we only need one inversion in Fq (and some extra operations) to invert in F q k. Since it is only used once, we didn t feel extra hardware was worthwhile. Could have used a variant of the binary EEA, but instead resorted to powering: a 1 = a 3m 2 This turns out to be not too bad but can obviously be improved on depending on the constraints imposed. Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 22

Results (1) Used a Xilinx ML300 prototyping device for implementation. Essentially, we put an embedded processor and F 3 m ALU on the Virtex-II PRO FPGA. The hope was to mimic the kind of architecture in a real processor design. F 3 m ALU PowerPC MicroBlaze Registers USB Ethernet LCD ATA PCMCIA Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 24

Results (2) F 3 97 in Polynomial Basis Slices Cycles Instructions Speed At 16 MHz At 150 MHz Add 112 3 1 - - Subtract 112 3 1 - - Multiply 946 28 1 - - Cube 128 3 1 - - Cube Root 115 3 1 - - Pairing Duursma-Lee - 59946 7857 3746.6µs 399.4µs Kwon - 64602 9409 4037.6µs 430.7µs Powering - 4941 397 308.8µs 32.9µs Total 4481 - - - - F 3 89 in Normal Basis Slices Cycles Instructions Speed At 16 MHz At 85 MHz Add 102 3 1 - - Subtract 102 3 1 - - Multiply 1505 48 1 - - Cube 0 3 1 - - Cube Root 0 3 1 - - Pairing Duursma-Lee - 89046 7857 5563.3µs 1047.6µs Kwon - 93702 9409 5856.3µs 1102.4µs Powering - 7941 397 496.3µs 93.4µs Total 4233 - - - - Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 26

Conclusions We can comfortably compute the pairing in under a second even at low clock speeds. There wasn t a lot of advantage from the normal basis arithmetic: Cube and cube root are cheap but multiplier is expensive. The polynomial basis cube root method of Baretto is single-cycle. Finding suitable curves and so on is a nightmare... Using the Kwon-BGOS method seems a better choice. There is plenty of scope for miniaturisation given performance margin: Using Kwon-BGOS removes need for cube-root hardware. Can adjust multiplier digit size, maybe even use a bit-wise design. Share addition logic between adder and multiplier. Reduce storage size by improving register allocation or introduce spillage into main memory. Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 28