Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster

Similar documents
EECS150 - Digital Design Lecture 22 - Arithmetic Blocks, Part 1

EECS 151/251A Fall 2018 Digital Design and Integrated Circuits. Instructor: John Wawrzynek & Nicholas Weaver. Lecture 5 EE141

On the strength comparison of ECC and RSA

ABHELSINKI UNIVERSITY OF TECHNOLOGY

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 2 - Combinational Logic Review and FPGAs. General Model for Synchronous Systems

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m )

IMPROVING THE PARALLELIZED POLLARD LAMBDA SEARCH ON ANOMALOUS BINARY CURVES

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

Implementation of ECM Using FPGA devices. ECE646 Dr. Kris Gaj Mohammed Khaleeluddin Hoang Le Ramakrishna Bachimanchi

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

XI STANDARD [ COMPUTER SCIENCE ] 5 MARKS STUDY MATERIAL.

Computing Elliptic Curve Discrete Logarithms with the Negation Map

Tate Bilinear Pairing Core Specification. Author: Homer Hsing

Adders, subtractors comparators, multipliers and other ALU elements

EECS150 - Digital Design Lecture 23 - FSMs & Counters

Elliptic Curves I. The first three sections introduce and explain the properties of elliptic curves.

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

Hardware Implementation of Efficient Modified Karatsuba Multiplier Used in Elliptic Curves

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

On the correct use of the negation map in the Pollard rho method

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

Lecture 6: Logical Effort

Gates and Flip-Flops

A Deep Convolutional Neural Network Based on Nested Residue Number System

High-Performance Scalar Multiplication using 8-Dimensional GLV/GLS Decomposition

ECE429 Introduction to VLSI Design

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

FPGA Realization of Low Register Systolic All One-Polynomial Multipliers Over GF (2 m ) and their Applications in Trinomial Multipliers

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Efficient random number generation on FPGA-s

International Journal of Advanced Computer Technology (IJACT)

Adders, subtractors comparators, multipliers and other ALU elements

Arithmetic Operators for Pairing-Based Cryptography

2. Accelerated Computations

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p

Elliptic Curve Group Core Specification. Author: Homer Hsing

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

Elliptic Curve Cryptography

Hardware Implementation of Elliptic Curve Point Multiplication over GF (2 m ) for ECC protocols

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

ALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1

ECC mod 8^91+5. especially elliptic curve 2y^2=x^3+x for cryptography Andrew Allen and Dan Brown, BlackBerry CFRG, Prague, 2017 July 18

EE 447 VLSI Design. Lecture 5: Logical Effort

VLSI Design, Fall Logical Effort. Jacob Abraham

Introduction to CMOS VLSI Design. Lecture 5: Logical Effort. David Harris. Harvey Mudd College Spring Outline

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs

Review Problem 1. should be on. door state, false if light should be on when a door is open. v Describe when the dome/interior light of the car

ECE 2300 Digital Logic & Computer Organization

Synchronous Sequential Logic

EECS150 - Digital Design Lecture 21 - Design Blocks

COMPUTER ARITHMETIC. 13/05/2010 cryptography - math background pp. 1 / 162

Signatures and DLP. Tanja Lange Technische Universiteit Eindhoven. with some slides by Daniel J. Bernstein

Total Time = 90 Minutes, Total Marks = 50. Total /50 /10 /18

Linear Feedback Shift Registers (LFSRs) 4-bit LFSR

Low complexity bit-parallel GF (2 m ) multiplier for all-one polynomials

Selecting Elliptic Curves for Cryptography Real World Issues

Vidyalankar S.E. Sem. III [CMPN] Digital Logic Design and Analysis Prelim Question Paper Solution

Hardware Design I Chap. 4 Representative combinational logic

EECS150 - Digital Design Lecture 17 - Combinational Logic Circuits. Limitations on Clock Rate - Review

University of Toronto Faculty of Applied Science and Engineering Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Layout of 7400-series Chips Commonly Used in. CDA 3101: Introduction to Computer Hardware and Organization

Hardware Architectures of Elliptic Curve Based Cryptosystems over Binary Fields

Elliptic and Hyperelliptic Curves: a Practical Security Comparison"

Logical Effort. Sizing Transistors for Speed. Estimating Delays

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

Daniel J. Bernstein University of Illinois at Chicago. means an algorithm that a quantum computer can run.

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago

An Efficient Multiplier/Divider Design for Elliptic Curve Cryptosystem over GF(2 m ) *

Lecture 8: Combinational Circuit Design

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Module 2. Basic Digital Building Blocks. Binary Arithmetic & Arithmetic Circuits Comparators, Decoders, Encoders, Multiplexors Flip-Flops

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec 09 Counters Outline.

EE241 - Spring 2001 Advanced Digital Integrated Circuits

Specialized Cryptanalytic Machines: Two examples, 60 years apart. Patrick Schaumont ECE Department Virginia Tech

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

GF(2 m ) arithmetic: summary

Total Time = 90 Minutes, Total Marks = 100. Total /10 /25 /20 /10 /15 /20

Tripartite Modular Multiplication

Modular Multiplication in GF (p k ) using Lagrange Representation

EE 209 Logic Cumulative Exam Name:

New Implementations of the WG Stream Cipher

Efficient and Secure Algorithms for GLV-Based Scalar Multiplication and Their Implementation on GLV-GLS Curves

Number System. Decimal to binary Binary to Decimal Binary to octal Binary to hexadecimal Hexadecimal to binary Octal to binary

Fundamentals of Digital Design

Efficient Finite Field Multiplication for Isogeny Based Post Quantum Cryptography

Pollard s Rho Algorithm for Elliptic Curves

Spiral 2-4. Function synthesis with: Muxes (Shannon's Theorem) Memories

Vidyalankar. S.E. Sem. III [EXTC] Digital System Design. Q.1 Solve following : [20] Q.1(a) Explain the following decimals in gray code form

An Analysis of Affine Coordinates for Pairing Computation

Arithmetic Operators for Pairing-Based Cryptography

University of Toronto Faculty of Applied Science and Engineering Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Transcription:

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India

We solved the discrete logarithm of a 113-bit Koblitz Curve. Challenge generated using SHA-256 Extrapolated 24 days on 18 Virtex-6 FPGAs

ECDLP Records In 2000: Binary Koblitz curve - ECC2K-108 using 9,500 PCs in 126 days In 2004: Binary elliptic curve - ECC2-109 using 2,600 PCs for 510 days In 2012: Elliptic curve over 112-bit prime field using 200 Playstation 3 for 6 months

TU Graz Records IT-Security Lecture 2012: 75 bit in days on quad-core 2013: 80 bit in 17 days on Core i5-2400 Master project: Virtex 6 FPGA 83 bit in (avg) 4.1 days Room for improvement

The higher the security level the lower the speed. With knowledge on the best attacks realistic security bounds are possible. potentially smaller parameters can be used. potentially faster algorithms can be used.

Elliptic Curve Discrete Logarithm Problem Parallelized Pollard s Rho Algorithm

We are looking for

Pollard s Rho Algorithm Iteration function Parallelized Pollard s Rho

Iteration Function

Iteration Function 41-bit Koblitz Curve Reference Iteration function Expected Measured iterations iterations Teske [29] f(x i )=X i + R[j] 929 10 3 906 10 3 Wiener and Zuccherato [31] f(x i )= min 0applel<m l (X i + R[j]) 145 10 3 147 10 3 Gallant et al. [14] f(x i )=X i + l (X i ) 145 10 3 166 10 3 Bailey et al. [4] f(x i )=X i + (l mod 16)/2+3 (X i ) 145 10 3 166 10 3

Architecture

FPGA Development Board

ASIC Design AND OR XOR One NAND Gate: NAND NOR XNOR 4 Transistors 3.136 µm 2 @ UMC 90nm 1.25 1.25 2.5 1 GE 1 2.5 Register/Flip-flop: ~5 GE

FPGA Design X-Ref Target - Figure 3 A6:A1 D COUT D DX C CX B BX A AX O6 DI2 O5 DI1 MC31 WEN CK DI1 MC31 WEN CK DI1 MC31 WEN CK DI1 MC31 WEN CK ug364_03_040209 DX DMUX D DQ C CQ CMUX B BQ BMUX A AQ AMUX Reset Type D FF/LAT INIT1 INIT0 SRHI SRLO SR CE CK FF/LAT INIT1 INIT0 SRHI SRLO FF/LAT INIT1 INIT0 SRHI SRLO FF/LAT INIT1 INIT0 SRHI SRLO D SR CE CK D SR CE CK D SR Q CE CK CIN 0/1 WEN CK Sync/Async FF/LAT A6:A1 O6 O5 C6:1 CX D6:1 DI A6:A1 O6 O5 B6:1 BX A6:A1 W6:W1 W6:W1 W6:W1 W6:W1 O6 O5 A6:1 AX SR CE CLK CE Q CK SR Q Q Q SRHI SRLO INIT1 INIT0 D CE Q CK SR SRHI SRLO INIT1 INIT0 D CE Q CK SR SRHI SRLO INIT1 INIT0 D CE Q CK SR SRHI SRLO INIT1 INIT0 DI2 DI2 DI2 CI BI AI LUT FF

FPGA Development Board

Multiple Small Cores Area Time

Core Idea

ECC Breaker X i c i d i Interface NextInput Point Addition F 2 m multiplier 79 % F 2 m squarer Branching Table F n adder F n adder F 2 m inverter Iteration Function Point Automorphism 14 % Lambda Table FIFO FIFO Distinguished Point Storage FIFO multiplier multiplier X i+1 c i+1 d i+1 F n F n

Point Addition and FF Inversion y 1 y 2 x 2 x 1 a S + M ADD ADD ADD S + M FIFO INV MUL FIFO FIFO 3S + M S + M FIFO SQU ADD 7S + M ADD 14S + M MUL ADD y 3 FIFO x 3 28S + M 56S + M S

Binary Field Multiplier Method Size Parallel 5,497 LUTs Mastrovito 7,104 LUTs Bernstein s Batch Binary Edwards Recursive Karatsuba 4,409 LUTs 3,757 LUTs

Point Automorphism i (P ) i+ smallest Point x y x y Square Square Compare x y C N C N rot 0 rot 1 rot 2 rot 3... x' y' x' y' FIFO i+1 (P ) FIFO comparator tree BARREL ROTATE N C N C x' y'

Details 210 pipeline stages Per default: canonical basis Normal basis used for point automorphism module Karatsuba Multiplier for tion Itoh-Tsujii Inversion tion Montgomery Multiplier based on DSP slices F n F 2 m F 2 m

Computation Time Distinguished Triples 7,000,000 5,250,000 3,500,000 1,750,000 0 0 10 20 30 40 50 Time [Days] April 19th, 2014 Extrapolated: 24 days

Challenge Generation import hashlib PX = str_to_poly(hashlib.sha256(str(0)).hexdigest()) PY=PolynomialRing(K, PY ).gen() P_ROOTS = (PY^2+PX*PY+PX^3+a*PX^2+b).roots() P=E([PX,P_ROOTS[0][0]]); P=P*h QX = str_to_poly(hashlib.sha256(str(1)).hexdigest()) Q_ROOTS = (PY^2+QX*PY+QX^3+a*QX^2+b).roots() Q=E([QX,Q_ROOTS[0][0]]); Q=Q*h

Different FPGAs Series Development Kit LUTs used maximum Frequency Price Virtex-6 ML605 38% 261 MHz 2,495 USD Spartan-6 LX150T - 147 MHz 995 USD Artix-7 AC701 62% 264 MHz 999 USD Virtex-7 VC707 28% 313 MHz 3,495 USD Kintex-7 KC705 42% 313 MHz 1,695 USD

Different Targets Target Iterations Costs [USD] Days (Estimated) ECC2K-112 8.5 x 10 42,000 22 ECC2-113 90 x 10 42,000 118 ECC2K-130 4,055 x 10 1,000,000 127 ECC2-131 46,239 x 10 10,000,000 145 ECC2-163 3,030 x 10 1,000,000,000 189,934

Open Issues Power problems Maximum frequency: 165 MHz vs 275 MHz Multiple instances Negation map and fruitless cycles

Random Facts Necessary budget: 18 FPGAs: 2,500 USD x 18 = 45,000 USD Power consumption: different budget :-) 1.5 man-years: 100,000 USD (different budget) Money actually spent: 20 EUR on chocolate

Room for improvement YES!!!! 2x speed equals 2 extra bits to attack 128x speed equals 14 extra bits to attack

Expected(Number(of(Itera2ons([bits]( 70" 60" 50" 40" 30" 20" 10" 0" Prime numbers: 109, 113, 127, 131, New Challenges 113*bit"Koblitz"a=1" 113*bit"Koblitz"a=0" 113*bit"Weierstrass" 127*bit"Koblitz"a=1" 127*bit"Koblitz"a=0" 127*bit"Weierstrass" 131*bit"Koblitz"a=1" 131*bit"Koblitz"a=0" 131*bit"Weierstrass" Without"Speedup" With"Speedup"

Prime numbers: 109, 113, 127, 131, New Challenges

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India