Bit-Parallel Word-Serial Multiplier in GF(2 233 ) and Its VLSI Implementation. Dr. M. Ahmadi

Similar documents
A Reconfigurable System on Chip Implementation for Elliptic Curve Cryptography over GF(2 n )

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

The Improved Montgomery Scalar Multiplication Algorithm with DPA Resistance Yanqi Xu, Lin Chen, Moran Li

A New Design of Multiplier using Modified Booth Algorithm and Reversible Gate Logic

Cryptanalysis of pairing-free certificateless authenticated key agreement protocol

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

Performance Analysis of the Postcomputation- Based Generic-Point Parallel Scalar Multiplication Method

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Scalable RSA Processor in Reconfigurable Hardware - a SoC Building Block

DESIGN OF A COMPACT MODULAR EXPONENTIATION ACCELERATOR FOR MODERN FPGA DEVICES

Power Efficient Design and Implementation of a Novel Constant Correction Truncated Multiplier

The stream cipher MICKEY

RSA /2002/13(08) , ); , ) RSA RSA : RSA RSA [2] , [1,4]

Implementation of Parallel Multiplier Accumulator based on Radix-2 Modified Booth Algorithm Shashi Prabha Singh 1 Uma Sharma 2

FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator

Towards strong security in embedded and pervasive systems: energy and area optimized serial polynomial multipliers in GF(2 k )

Lecture 5, October 8. DES System (Modification)

A Novel, Low-Power Array Multiplier Architecture


CSE4210 Architecture and Hardware for DSP

Quadratic speedup for unstructured search - Grover s Al-

Lecture 4: Adders. Computer Systems Laboratory Stanford University

Are standards compliant Elliptic Curve Cryptosystems feasible on RFID?

Module #6: Combinational Logic Design with VHDL Part 2 (Arithmetic)

High-Speed Low-Complexity Reed-Solomon Decoder using Pipelined Berlekamp-Massey Algorithm and Its Folded Architecture

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

Cube Attack on Reduced-Round Quavium

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

PART 8. Partial Differential Equations PDEs

High Performance Rotation Architectures Based on the Radix-4 CORDIC Algorithm

Efficient FPGA-based Karatsuba multipliers for polynomials over F 2

A Fast FPGA based Architecture for Determining the Sine and Cosine Value

Lecture 10 Support Vector Machines II

Logic effort and gate sizing

Post-quantum Key Exchange Protocol Using High Dimensional Matrix

A Low Error and High Performance Multiplexer-Based Truncated Multiplier

Application of Nonbinary LDPC Codes for Communication over Fading Channels Using Higher Order Modulations

Parallel MAC Based On Radix-4 & Radix-8 Booth Encodings

Improving XOR-Dominated Circuits by Exploiting Dependencies between Operands

An Efficient Eligible Error Locator Polynomial Searching Algorithm and Hardware Architecture for One-Pass Chase BCH Codes Decoding

Use of Sparse and/or Complex Exponents in Batch Verification of Exponentiations

Message modification, neutral bits and boomerangs

= z 20 z n. (k 20) + 4 z k = 4

Introduction to Density Functional Theory. Jeremie Zaffran 2 nd year-msc. (Nanochemistry)

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

Estimating Delays. Gate Delay Model. Gate Delay. Effort Delay. Computing Logical Effort. Logical Effort

Hardware Implementation of Elliptic Curve Cryptography over Binary Field

EEE 241: Linear Systems

A Hybrid Variational Iteration Method for Blasius Equation

HIGH-SPEED MULTI OPERAND ADDITION UTILIZING FLAG BITS VIBHUTI DAVE DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

Montgomery Multiplier and Squarer in GF(2 m )

Implementation and Study of Reversible Binary Comparators

RISC Processors. Hierarchical VLSI Design. Multiple Layered Architecture. 6. Case Study: Formal Verification of RISC Processors using HOL

ABHELSINKI UNIVERSITY OF TECHNOLOGY

Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning

A MORE SECURE MFE MULTIVARIATE PUBLIC KEY ENCRYPTION SCHEME *

The Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL

Design and Performance testing of Arithmetic Operators Library for Cryptographic Applications

Semi-supervised Classification with Active Query Selection

Efficient Fixed Base Exponentiation and Scalar Multiplication based on a Multiplicative Splitting Exponent Recoding

Exercises. 18 Algorithms

REDUCTION MODULO p. We will prove the reduction modulo p theorem in the general form as given by exercise 4.12, p. 143, of [1].

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

On the Interval Zoro Symmetric Single-step Procedure for Simultaneous Finding of Polynomial Zeros

Representations of Elementary Functions Using Binary Moment Diagrams

2 More examples with details

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Dual-Field Arithmetic Unit for GF(p) and GF(2 m ) *

Fast arithmetic for polynomials over F 2 in hardware

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane

Aging model for a 40 V Nch MOS, based on an innovative approach F. Alagi, R. Stella, E. Viganò

Over-Temperature protection for IGBT modules

Algebraic properties of polynomial iterates

Exhaustive Search for the Binary Sequences of Length 2047 and 4095 with Ideal Autocorrelation

Attacks on RSA The Rabin Cryptosystem Semantic Security of RSA Cryptology, Tuesday, February 27th, 2007 Nils Andersen. Complexity Theoretic Reduction

Augmented Broadcaster Identity-based Broadcast Encryption

CHAPTER 4 MAX-MIN AVERAGE COMPOSITION METHOD FOR DECISION MAKING USING INTUITIONISTIC FUZZY SETS

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Probability-Theoretic Junction Trees

Impossible differential attacks on 4-round DES-like ciphers

Combinational Circuit Design

Theoretical Modeling of the Itoh-Tsujii Inversion Algorithm for Enhanced Performance on k-lut based FPGAs

NUMERICAL DIFFERENTIATION

Hardware Implementation of Elliptic Curve Processor over GF (p)

Fast Variants of RSA

Discussion 11 Summary 11/20/2018

Design and Implementation of a Low Power RSA Processor for Smartcard

Research Article On the Use of an Algebraic Signature Analyzer for Mixed-Signal Systems Testing

LETTER Skew-Frobenius Maps on Hyperelliptic Curves

Lecture 4: Universal Hash Functions/Streaming Cont d

Existence of Two Conjugate Classes of A 5 within S 6. by Use of Character Table of S 6

An Algorithm for Inversion in GF(2 m ) Suitable for Implementation Using a Polynomial Multiply Instruction on GF(2)

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Formation of Pseudo-Random Sequences of Maximum Period of Transformation of Elliptic Curves

Uncertainty in measurements of power and energy on power networks

Hardening the ElGamal Cryptosystem in the Setting of the Second Group of Units

Transcription:

Bt-Parallel Word-Seral Multpler n GF(2 233 ) and Its VLSI Implementaton Supervsors: Student: Dr. Huapeng Wu Dr. M. Ahmad Wenka Tang

Contents Introducton to Fnte Feld Research Motvatons Proposed Multplers VLSI Desgn Conclusons References 2

Introducton to Fnte Feld Fnte feld A set of fnte number of elements where addton and multplcaton are defned, denoted as GF Eample : GF(2) {,, +, * } * + Multplcaton Addton Eample 2: GF(2 2 ) can be generated by F() 2 ++ where {,} s called a polynomal bass Four elements are: + ( ( ( ( ) ) ) ) 3

4 Fnte Feld Multplcaton 2 2 ),...,, ( ),...,, ( m m m m m m b b b b B and a a a a A Let be any two feld elements n GF(2 m ), where }, {, a b 2 ) ( mod ),...,, ( m m j j j m m F b a AB c c c C Then the product Ths s what we want to mplement Eample: GF(2 2 ) s generated by F() 2 ++ Let B A + () () Then () ) ( mod ) ( ) ( ) mod ( ) ( mod ) ( 2 2 + + + + + F F F AB C

Fnte feld multplers CAB Operand B Bt-parallel fnte feld multpler AND gates: m 2 XOR gates: m 2 - Operand A Product C A, B, C GF(2 5 ) Bt-seral fnte feld multpler Operand B AND gates: m XOR gates: m+ m-bt regsters: 2 Operand A One multplcaton needs m clock cycles A, B, C GF(2 5 ) Product C 5

Bt-parallel squarer CA 2 Archtecture Operand A Squarng C Bt-parallel squarer n GF(2 5 ) Gate counts 3 XOR gates 6

Research Motvatons Smart card and applcatons Usually a plastc card that contans a securty processor and has many securty related applcatons E-Commerce Personal fnance Health care Campus badges and access Telecommutng and corporate network securty GSM cell phones Lmtatons Low frequency, lmt memory sze Software mplementaton of securty applcaton s slow and nsecure Area constrant 7

Smart card and publc key cryptosystem Publc key cryptosystem key echange, dgtal sgnature and encrypton/decrypton Ellptc Curve (EC) over RSA Shorter key length than RSA wth the same securty strength Very sutable for VLSI mplementaton EC s more sutable for smart card EC operatons Fnte feld multplcaton Fnte feld squarng Fnte feld addton We wll desgn a fnte feld multpler for smart card 8

Proposed Multplers Choose a fnte feld Degree 63 233 283 49 57 Polynomal F( ) F ( ) F( ) F( ) F( ) 63 7 6 3 + + + 233 74 + + 283 2 7 5 + + + 49 87 + + 57 5 2 + + + + + + Fnte felds recommended by NIST for ellptc curve systems 9

Bt-Parallel Word-Seral (BPWS) Multpler Let {,, 2,, 232 } be the polynomal bass for GF(2 233 ). Let A and B be any two feld elements and A 232 a, where a GF ( 2 ) B 232 b, where b GF ( 2 ) The product s C AB 232 a mod B F ( ) mod F ( )

Bt-Parallel Word-Seral (BPWS) Multpler (Cont d) Algorthm: A ( a 4243 4 A 29 a 23... a 224... a7a6... a 42 43 42 232 43 A A 28 ) Archtecture: 7 6 Let A a + a + + a, for,,..., 29 j + 8 j+ 7 8 j 6... 8 j j Then A 232 a 8 8 8 (...( A29 + A28 ) +... + A ) + A C ABmodF( ) 8 8 8 8 (...(( A29B + A28B) + A27B) +... + A B) + A BmodF( ) Let D j A29 j B, for j,,..., 29 C j C j 8 + D j, for j,,...,29, and C Then C C 29

Generatng the Product D j A29 j B, for j,,..., 29 C j C j j 8 + D, for j,,...,29, and C Clock cycle Output of M Output of M4 Output D C D D C 8 C 2 D 2 C 8 C 2 28 D 28 C 27 8 C 28 29 D 29 C 28 8 C 29 C 2

M3: Constant multpler γ 8 α Logc equaton Crcut γ α α α α 225 8 8 8 + + α 5 +,,... 7 8,9,... 73 74,75,... 82,83,... 8 232 Gate count 8 XOR gates 3

M: 8 233 Partal product generator A j B Functon 7 A j B ( a + a +... + a7 ) B 7 a B + a B +... + a B Components Seven constant multplers Eght AND networks A XOR network 7 4

M: 8 233 Partal product generator (Cont d) Archtecture 5

M: 8 233 Partal product generator (Cont d) Constant multplers j α, j,2,,7. Smlar archtecture as M3 ( 8 α) 6

M: 8 233 Partal product generator (Cont d) AND network 7

M: 8 233 Partal product generator (Cont d) XOR network 7 XOR sub networks M: Sub XOR network 8

Alternatve BPWS fnte feld multpler Least sgnfcant word (LSW) frst archtecture One addtonal m-bt regster needed One multplcaton stll needs 3 clock cycles Archtecture: 9

General BPWS fnte feld multpler Fnte feld: GF(2 m ) Word sze: p Components: -One p m partal product generator -One adder (m XOR gates) -One constant FFM -One m-bt regster 2

Comparsons Multpler Fnte feld Speed (Clock cycle) Crcut complety Parallel GF(2 233 ) 233 2 AND gates 233 2 - XOR gates Seral 233 two 233-bt regsters 233 AND gates 234 XOR gates Proposed 3 8*233 AND gates BPWS 8*233+36 XOR gates one 233-bt regster Alternatve 3 8*233 AND gates BPWS 8*233 +36 XOR gates Two 233-bt regsters General BPWS Trnomal GF(2 m ) Celng functon of (m/p) p*m AND gates p*m+(p+)p/2 XOR gates (<k<m/2 One m-bt regster p word wdth) 2

Target VLSI Desgn ASIC chp whch can perform multplcaton and squarng n GF(2 233 ) Specfcatons Frequency: 5MHz Gate counts: 4 Desgn flow CMC dgtal desgn flow Technology TSMC.8 µm CMOS technology 22

Hardware schematc 23

Fnal results and comparsons Multpler Frequency (MHz) Feld sze # of cells Gate counts Area (µm 2 ) VLSI technology BPWS 8233 5 (ma. 3) 2 233 329 4893 89297.6439 TSMC.8µm CMOS Squarer 293 293 6437.5746 Classcal 233233 [] 77 2 233 37296 LUTs 37552 FFs 528427 N/A Xln FPGA XC2V6- ff57-4 Hans et al MSD 64256 [2] 66.4 2 256 4797 LUTs 2948 FFs 3664 N/A Xln FPGA Vrte-II XCV2E-7 Souch et al 8288 [3] 3 2 576 2*8*288 ANDs 2*8*288 XORs 3*(8+288) FFs 4544 N/A ALTERA FPGA EPFK25AG C5992 24

Chp Layout 25

Conclusons Bt-parallel word-seral multpler archtectures are proposed. The proposed archtectures are not only useful for smart card but also benefcal to other securty processors. An ASIC chp whch has the proposed BPWS multpler and bt parallel squarer s mplemented. A novel 8233 partal product generator s desgned. Future work epected s to use ths multpler n securty processor for smart card. 26

References [] Grabbe C.,Bednara M., Tech J.,Von Zur Gathen J., Shokrollah J, FPGA desgns of parallel hgh performance GF(2^233) multplers, Crcuts and Systems, 23. ISCAS '3. Proceedngs of the 23 Internatonal Symposum on, Volume: 2, 25-28 May 23 [2] Hans Eberle, Sheuelng Chang, Nls Gura, Sumt Gupta, Dnel Fnchelsten, Edouard Goupy, Douglas Stebla, An End-to-End Systems Approach to Ellptc Curve Cryptography Sun Mcrosytems Laboratores 22-23 [3] Souch Okada, Naoya Tor, Kouch Itoh, Masahko Takenaka, Implementaton of Ellptc Curve Cryptographc Coprocessor over GF(2^m) on an FPGA ', C.K. Koc and C. Paar (Eds.): CHES 2, LNCS 965, pp. 25-4, 2. Sprnger-Verlag Berln Hedelberg 2 27

Queston? 28

THANK YOU! 29