Towards strong security in embedded and pervasive systems: energy and area optimized serial polynomial multipliers in GF(2 k )

Similar documents
The Parity of the Number of Irreducible Factors for Some Pentanomials

Excess Error, Approximation Error, and Estimation Error

System in Weibull Distribution

Bit-Parallel Word-Serial Multiplier in GF(2 233 ) and Its VLSI Implementation. Dr. M. Ahmadi

Applied Mathematics Letters

1 Definition of Rademacher Complexity

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

Three Algorithms for Flexible Flow-shop Scheduling

Chapter 12 Lyes KADEM [Thermodynamics II] 2007

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

An Optimal Bound for Sum of Square Roots of Special Type of Integers

COS 511: Theoretical Machine Learning

Time and Space Complexity Reduction of a Cryptanalysis Algorithm

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

Galois Field Hardware Architectures for Network Coding

Gadjah Mada University, Indonesia. Yogyakarta State University, Indonesia Karangmalang Yogyakarta 55281

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

Computational and Statistical Learning theory Assignment 4

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

Design and Performance testing of Arithmetic Operators Library for Cryptographic Applications

4 Column generation (CG) 4.1 Basics of column generation. 4.2 Applying CG to the Cutting-Stock Problem. Basic Idea of column generation

Preference and Demand Examples

ON THE NUMBER OF PRIMITIVE PYTHAGOREAN QUINTUPLES

Theory and Algorithm for SPFD-Based Global Rewiring

FUZZY MODEL FOR FORECASTING INTEREST RATE OF BANK INDONESIA CERTIFICATE

XII.3 The EM (Expectation-Maximization) Algorithm

LECTURE :FACTOR ANALYSIS

1 Review From Last Time

NP-Completeness : Proofs

Slobodan Lakić. Communicated by R. Van Keer

International Journal of Mathematical Archive-9(3), 2018, Available online through ISSN

INPUT-OUTPUT PAIRING OF MULTIVARIABLE PREDICTIVE CONTROL

arxiv: v2 [math.co] 3 Sep 2017

Handling Overload (G. Buttazzo, Hard Real-Time Systems, Ch. 9) Causes for Overload

A New Design of Multiplier using Modified Booth Algorithm and Reversible Gate Logic

Introducing Entropy Distributions

Finite Vector Space Representations Ross Bannister Data Assimilation Research Centre, Reading, UK Last updated: 2nd August 2003

Notes on Frequency Estimation in Data Streams

Least Squares Fitting of Data

Quantum Particle Motion in Physical Space

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The CORDIC method of calculating the exponential function Metoda CORDIC obliczania funkcji eksponencjalnej

Xiangwen Li. March 8th and March 13th, 2001

Structure and Drive Paul A. Jensen Copyright July 20, 2003

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

halftoning Journal of Electronic Imaging, vol. 11, no. 4, Oct Je-Ho Lee and Jan P. Allebach

A Differential Evaluation Markov Chain Monte Carlo algorithm for Bayesian Model Updating M. Sherri a, I. Boulkaibet b, T. Marwala b, M. I.

Determination of the Confidence Level of PSD Estimation with Given D.O.F. Based on WELCH Algorithm

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Uncertainty in measurements of power and energy on power networks

Lecture 14: Forces and Stresses

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

On Syndrome Decoding of Punctured Reed-Solomon and Gabidulin Codes 1

Statistical analysis of Accelerated life testing under Weibull distribution based on fuzzy theory

The Impact of the Earth s Movement through the Space on Measuring the Velocity of Light

On Pfaff s solution of the Pfaff problem

Clock-Gating and Its Application to Low Power Design of Sequential Circuits

,..., k N. , k 2. ,..., k i. The derivative with respect to temperature T is calculated by using the chain rule: & ( (5) dj j dt = "J j. k i.

Pulse Coded Modulation

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

EXACT TRAVELLING WAVE SOLUTIONS FOR THREE NONLINEAR EVOLUTION EQUATIONS BY A BERNOULLI SUB-ODE METHOD

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

Revision: December 13, E Main Suite D Pullman, WA (509) Voice and Fax

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

On the correction of the h-index for career length

Collaborative Filtering Recommendation Algorithm

Finite Fields and Their Applications

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

VERIFICATION OF FE MODELS FOR MODEL UPDATING

The Synchronous 8th-Order Differential Attack on 12 Rounds of the Block Cipher HyRAL

Performance Analysis of V-BLAST with Optimum Power Allocation

Fermi-Dirac statistics

Scroll Generation with Inductorless Chua s Circuit and Wien Bridge Oscillator

COMP th April, 2007 Clement Pang

Study of the possibility of eliminating the Gibbs paradox within the framework of classical thermodynamics *

The Non-equidistant New Information Optimizing MGM(1,n) Based on a Step by Step Optimum Constructing Background Value

Algorithm for reduction of Element Calculus to Element Algebra

Formulas for the Determinant

Identifying assessor differences in weighting the underlying sensory dimensions EL MOSTAFA QANNARI (1) MICHAEL MEYNERS (2)

EXAMPLES of THEORETICAL PROBLEMS in the COURSE MMV031 HEAT TRANSFER, version 2017

Chapter 13. Gas Mixtures. Study Guide in PowerPoint. Thermodynamics: An Engineering Approach, 5th edition by Yunus A. Çengel and Michael A.

Numerical Heat and Mass Transfer

Analysis of the Magnetomotive Force of a Three-Phase Winding with Concentrated Coils and Different Symmetry Features

Least Squares Fitting of Data

Temperature. Chapter Heat Engine

Implicit scaling of linear least squares problems

NUMERICAL DIFFERENTIATION

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

On the Multicriteria Integer Network Flow Problem

Convexity preserving interpolation by splines of arbitrary degree

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Cryptanalysis of pairing-free certificateless authenticated key agreement protocol

Worst Case Interrupt Response Time Draft, Fall 2007

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint

An efficient algorithm for multivariate Maclaurin Newton transformation

Transcription:

Towards strong securty n ebedded and pervasve systes: energy and area optzed seral polynoal ultplers n GF( k ) Zoya Dyka, Peter Langendoerfer, Frank Vater and Steffen Peter IHP, I Technologepark 5, D-53 Frankfurt (Oder), Gerany http://www.hp-croelectroncs.co/ Abstract In ths paper we dscuss a theoretcal approach to do early assessent of the area and power consupton of hardware accelerators for ellptc curve cryptography. For evaluaton we developed several dfferent one clock ultplers as buldng block for the fnal seral ultplers. The forer are evaluated concernng ther effcency n coparson to lterature and the results of our assessent technque are copared wth synthess results. Snce the applcaton areas we are nterested n are ebedded systes, pervasve coputng or wreless sensor networks we nvestgated seral ultplers n order to acheve reduced area and energy consupton copared to the sngle clock ultplers. Our evaluaton clearly shows that our approach provdes good hnts to select well suted pleentaton canddates. Keywords: ECC, GF( k ), polynoal ultplcaton I. INTRODUCTION Ebedded systes and wreless sensor networks have ganed sgnfcant nterest for beng used n crtcal applcaton areas such as ndustral control systes. Relablty and by that securty of those systes s of extreely hgh portance. But, for resource constrant devces the coputatonal effort of strong securty eans, especally when publc key cryptography s used are stll too costly. Durng recent years ellptc curve cryptography has ganed sgnfcant attenton. Ths s due to the fact that t provdes hgher securty than RSA whle consung sgnfcant fewer resources. I.e. ts keys are shorter and the coputatonal overhead s less. But, for resource constrant devces such as ebedded systes or wreless sensor nodes the coputatonal effort when realzed n software s stll too hgh. The alternatve s to pleent ECC operatons as hardware accelerators. The polynoal ultplcaton s costly and often executed when ECC s appled. Thus, t s the functon that has the hghest pact on area and/or energy consupton and by that t s the ost often researched functon for hardware pleentaton of ECC. Fndng the optu hardware accelerator s a te consung and dffcult task. On the one hand t needs to be as sall as possble to reduce producton costs, on the other hand t needs to be as energy effcent as possble snce the target syste s battery powered. One way to prove the hardware pleentaton of ECC s the optzaton of polynoal ultplcaton. There exst several approaches to nze the coplexty of the polynoal ultplcaton []-[8]. A very prosng approach to optze polynoal ultplers s to cobne these ethods [9]. If an optu cobnaton s found ths reduces the area and energy consupton sgnfcantly []. Another eans to reduce the area of ultplers s to prolong the calculaton te,.e. to spend several clock cycles to calculate the polynoal product as a su of partal products [], []. Deternng the ultpler wth the nal energy and/or area consupton s a tedous task especally when several ultplcaton ethods are taken nto account. In ths paper we present a eans to assess the gate coplexty of polynoal ultplers and dscuss ts applcablty usng 5 desgn alternatves as well as synthess results of the ost prosng pleentaton canddates. Our evaluaton shows that the coplexty assessent provdes good results well suted to select fro dfferent desgn choces. The rest of ths paper s structured as follows. In secton II we shortly ntroduce the background of polynoal ultplcaton and our coplexty assessent approach. After that we dscuss and evaluate one clock ultplers that we use later for realzng seral ultplers, and ther theoretcal gate coplexty. Then we nvestgate the resultng seral ultplers and ther gate coplexty. In secton III we evaluate our approach usng data fro the lterature as well as synthess results. The paper closes wth short conclusons. II. POLYNOMIAL MULTIPLIERS A. Defnton of polynoal ultplcaton Frst of all we gve the defnton of polynoal ultplcatons over extended bnary felds GF( k ). k k Let A( a x and B ( b x be eleents fro GF( k ) and f( be the rreducble polynoal of degree k generatng GF( k ). The ultplcaton of the eleents A( and B( s defned as follows: C '( A( B( od f ( C( od f (. () 978--4673-9-6//$3. IEEE

The ultplcaton () can be perfored n two steps: the calculaton of the polynoal C( of degree (k-) and reducton of ths polynoal C( to the degree k. In ths paper we concentrate on the optzaton of the frst step of the ultplcaton only,.e. on the polynoal ultplcaton: C ( A( B( k c x, wth c a j bl k> j, k l j+ > l, The sybol n () denotes the Boolean operaton. Eleents of GF( k ) can be (and are usually) represented as k- k bts long bnary nubers: A (. a B. Gate coplexty of polynoal ultplcaton We descrbe the gate coplexty () of polynoal ultplcaton for k-bts operands by the exact nuber of the Boolean operatons # and by the exact nuber of the Boolean operatons #, because only these two operatons are needed to calculate the polynoal product (see ()) and because they are pleented n hardware by an or an gate respectvely. () (# ; ). (3) MM k # In (3) the MM shows the used ultplcaton ethod. The nal area and energy consupton of a ultpler can be calculated based on ts gate coplexty (3) and on the area and the energy consupton of the gates (Area, Area and E, E ) of the appled technology: Area E # # Area + # Area. (4) E + # E We use the followng odel for the average energy n (4): In our odel are consderng two nput and gates only. The energy consupton of the gates depends on ther nput values. If the nput values do not trgger any state change the consued energy s neglgble, otherwse.e. f nput values trgger state changes the consued energy s by far hgher and depend on the logc functon pleented by the gate and on whether the output changes fro zero to one or vce versa. In order to estate the energy consupton of desgn alternatves we use the average energy consupton of and gates (E, E ) whch we calculated fro the energy consupton of the followng four state changes for the IHP.3µ technology: - frst nput s ; second nput s and output changes to - frst nput s ; second nput s and output changes to - frst nput s ; second nput s and output changes to - frst nput s ; second nput s and output changes to. An addtonal challenge when calculatng the energy consupton stes fro the fact that not all gates are always flppng when the polynoal product s calculatng. Ths fact can be taken to account usng a statstcal coeffcent. For exaple, f n average only 5% of all gates are flppng forula (4) can be odfed as follows: E / (# E + # E ). The proble s that the nuber of the flppng gates of the polynoal ultpler depends on both ultplcands,.e. on the nputs. The usng of such statstcal coeffcents n the energy consupton odel of polynoals ultplers ght result n a sgnfcant dfference of the theoretcal and practcal results snce the latter are obtaned for the concrete ultplcand values. We are aware of the fact that our odel only very roughly approxates the real energy consupton but snce we are anly nterested n coparng pleentaton alternatves, and not n predctng concrete energy consupton values, we are convnced that our odel s well suted for assessng pleentaton canddates. There exst any ultplcaton ethods (MM) that can be appled to reduce the coplexty of polynoal ultplers: the classcal and the generalzed Karatsuba MM [] for n>; Karatsuba MM [] for - and Wnograd MM [3] for 3-ter operands, the latter two are the specal cases of the generalzed Karatsuba MM; Montgoery MM for 5-, 6- and 7-ter operands [5]. These approaches segent the ultplcands nto n saller -bts long parts (ters), whch are then ultpled. Of all these MMs the classcal MM requres the bggest nuber of partal ultplcatons. Other MMs lead to a reduced nuber of partal ultplcatons but requre ore -operatons copared to the classcal MM. The gate coplexty of these Karatsuba-lke ultplcaton ethods can be also descrbed usng a slghtly odfed.e. recursve defnton of the gate coplexty: (# MULT ; ) MM k n #, where (5) MM j MM the used MM; #MULT the nuber of partal ultplcatons, and MM j the gate coplexty of partal ultplers. The nuber of partal ultplcatons #MULT and the nuber of gates # depend on the selected ultplcaton ethod MM and on the segentaton n of the operands. The MM j knowledge of the gate coplexty of each partal ultpler as tuple (3) allows calculatng the gate coplexty of full k-bts ultplers. C. One clock ultplers A one clock ultpler, whch pleents the ultplcaton ethod MM and contans #MULT of partal ultplers (see (5)), can calculate the polynoal product n a sngle clock cycle. Each partal ultplcaton can be pleented by any ultplcaton ethod MM j or even by any cobnaton of ultplcaton ethods. Optal cobnatons of dfferent MMs can be deterned algorthcally [9]. In [] we presented the deternaton of area and/or energy optal cobnatons of 8 MMs wth reduced nuber of operatons.

In ths paper we descrbe the cobnaton of 3 MMs as descrbed n []. We deterned the area-optal cobnatons of these MMs for the IHP.3µ technology for all lengths of operands up to 6 bts. In order to benchark our results we are usng results fro [] and [9]. Snce these artcles gve the gate coplexty only for operands of a length up to 8 bts we reconstructed the gate coplexty for longer operands.e. up to 6 bts. We calculated the area of ultplers based on the reconstructed gate coplexty usng eq. (4). Fg. shows the calculated area of ultplers for polynoals of a length of up to 6 bts for both approaches used as bencharks and for our approach. The red curve n Fg. depcts the chp-area of ultplers that we calculated based on gate coplexty reconstructed fro []. The black curve depcts results we derved fro [9]. The area consupton of our approach (green curve n Fg.) s 33 per cent saller than the results represented by the red curve and about per cent saller than the results represented by the black curve. synthess of the ultplers apples often a sngle 3-nputs- gate nstead two -nputs- gates, reducng the resultng area and changng the energy consupton. The average devaton of the area of syntheszed ultplers fro the calculated one s about 7 %. The devaton of the energy values s uch larger,.e. approxately 6%. Ths devaton s caused by the followng facts: - not all gates are flppng causng the reducton of the theoretcally deterned energy consupton value - the energy consupton of each gate depends on ts nput values and by that the average value used n the calculaton already devates stronger fro the real values as n the case of the constant area paraeter. To obtan energy consupton values fro Table I we perfored a sequence of ore than ultplcatons usng rando nputs. For each gate of the ultplers ts energy consupton was collected. Based on these values the total power consupton of the ultpler and ts energy consupton per sngle clock cycle (.e. per ndvdual ultplcaton) were deterned. Table II shows the relaton of theoretcally calculated values and synthess results for area and energy for nvestgated ultplers usng the 64-bt partal ultpler as noralzaton value. TABLE II. RELATIONS OF PARAMETERS OF INVESTIGATED MULTIPLIERS Fgure. Calculated area of ultplers for an operand length of up to 6 bts for our approach (green lne) and approaches presented n [] (red lne) and [9] (black lne) respectvely Table I shows the paraeters of the syntheszed polynoal ultplers that we used as -bts partal ultplers n our seral n-bts polynoal ultpler. TABLE I., bts PARTIAL MULTIPLIERS: CALCULATED PARAMETERS SYNTHESIZED VALUES Gate Coplexty Theory area, energy, pj area, Synthess energy, pj 64 64(96 ; 387 ).46 35.3.396 9.8 59 6(35 ; 94 ).39 3.76.36 7. 78 8(5 ; 3396 ).66 5.34.575 4.9 The area and energy of all syntheszed ultplers are saller than the theoretcal calculated values. Ths can be explaned by the followng fact: the calculaton of the area of ultplers s based on the area of -nputs- and -nputs- gates. In realty each technology provdes also other gate types. For exaple, the IHP.3µ technology [3] has also 3-nputs- gates. The Synopss tools [4], that we use for the, bts area, energy, pj Theory Synthess Theory Synthess 64 59.9.9.93.9 78.58.59.57.58 D. Seral ultplers All partal products can be calculated seral wthn #MULT clock cycles usng a sngle -bts partal ultpler only (see (5)). Ths reduces the area of the k-bts polynoal ultpler sgnfcantly [], []. In seral ultplers the polynoal product wll be accuulated n output regsters. We explan ths process usng the exaple of the Wnograd MM for 3-ter polynoals,.e. for the case of the segentaton of the operands nto n3 - bts long ters. The ultplcaton forula for ths case s the followng: C ( C 5 C 4 C C C C A A A B B B A B 3 6 ( A B A B ( A A )( B B )) ( A B A B A B ( A A )( B B )) 3 4 ( A B A B ( A A )( B B )) A B Each partal product A B (or j (A A j )(B A j ) ) n (6) s (-)-bts long and can be wrtten as a su of two segents: the (-) ost sgnfcant bts, whch wll be denoted as [] and the reanng -bts denoted as []. (6) by results of [] we denote the data provded for the recursvely appled generalzed Karatsuba MM wth nal nuber of and gates.

[ ] + -bts bts bts (7) Eq. (8) s derved fro (6) by applyng (7): C [] C( [ ] [ ] [ ] C3 [] [] [] C 3 [] C4 [] [] [] [] 4 C C5 5 Eq. (8) can also be represented as a table In the rest of ths paper naed as table representaton, (TR), see Table II. The leftost colun shows both segents of the partal products,.e. all [d] and j [d], where d {,}. All other coluns correspond to an ndvdual product segent. If the product segent C s calculated by addng a partal product segent j [d], the cell of the TR wth 'coordnates' ( j [d], C ) s flled wth the sybol. Otherwse the cell s epty. TABLE III. TABLE REPRESENTATION OF 3-TERM-WINOGRAD-MM, INDICATES WHICH PARTIAL PRODUCTS NEED TO BE ADDED TO RETRIEVE C A B [] [] A B [] [] A B [] [] A B [] [] A B [] [] A B [] [] (A A ) (B B ) [] [] (A A ) (B B ) [] [] (A A ) (B B ) [] [] (A A ) (B B ) [] [] (A A ) (B B ) [] [] (A A ) (B B ) [] [] C 5 C 4 C 3 C C C Table III contans 6 dfferent partal products. They can be calculated wthn 6 clock cycles usng a sngle partal ultpler. In ths case only one partal product s avalable at the sae te. For each segent C the su of necessary segents of partal products can be accuulated wthn 6 clock cycles. Fg. shows the structure of the seral ultpler and Table IV shows the correspondng accuulaton plan. Fg. 3 and 4 show the accuulaton unts for dfferent segents of the polynoal products accordng to the accuulaton plan. (8) Clock cycle Fgure. Structure of our seral polynoal ultpler TABLE IV. ACCUMULATION PLAN FOR CALCULATION OF THE POLYNOMIAL PRODUCT ACCORDING TO TABLE II Avalable partal product Accuulaton plan C C []; C C [][]; C C [][]; C 3C 3[] C C []; C C [][]; C 3C 3[][]; C 4C 4[] 3 C C []; C 3C 3[][]; C 4C 4[][]; C 5C 5[] 4 C C []; C C [] 5 C C []; C 3C 3[] 6 C 3C 3[]; C 4C 4[] Fgure 3. The accuulaton unt for C : ths unt contans gates, gates and flp-flops to realze the -bts regster. The structure of the accuulaton unt for C n- s the sae, but the nput value s the (-)-bts long []. Fgure 4. Accuulaton unt for C, <<n-; ths unt contans (+-) gates, (-+) gates and flp-flops to realze the -bts regster. The gate coplexty of the k-bts seral polynoal ultplers can be descrbed accordng to ts structure (see Fg. ) as follows: n n + + Control calculaton Accuulat on unt of nputs for partal unt for C ultpler + (9)

The gate coplexty of the control unt s very sall copared to the area of other unts. Therefore (9) can be wrtten for the pleentaton of 33-bts ultplcaton as follows: n + ( 33 ; ( 33 ) ) + (( ) ; ( ) ) + ( ( ) ; ( ) ) + ( 467 + 4n ( n + ) ; 34 + 4n n 3 ) ( n ; ) + + () For the descrbed segentaton of ultplcands nto n3 ters the length of segents n () s equal to 78 bts. Eq. () can also be used for the calculaton of the gate coplexty of other 33-bts seral ultplers, f they have the sae structure. Eq. () doesn t contan the nuber of flp-flops (see Fg. 3 and 4) because each ultpler contans the sae nuber of flp-flops for the output values ndependently of the pleented ultplcaton ethod. III. EVALUATION OF THEORETICAL DATA In ths secton we dscuss the theoretcal perforance paraeters of the ultplers we nvestgated. The executon te of the ultplcaton s an portant paraeter. We nvestgated the seral polynoal ultplers, whch can calculate the polynoal product n less than 3 clock cycles. We calculated the area and energy consupton for the followng MMs usng eq. (): - n-ter Karatsuba MM wth <n 7 - n-ter Wnograd MM wth <n 7 - generalzed n-ter Karatsuba MM wth <n 7 - n-ter Montgoery MM wth 5 n 7 We obtaned the ultplcaton forulae for the n-ter Karatsuba MM n the followng way: we derved the ultplcaton forulae for the 8-ter ultplcands usng the Karatsuba ultplcaton forula recursvely. The expresson for ultplcaton of 8-ter-polynoals contans 7 partal products of -bt long ters of ultplcands. The expresson for the polynoal product of 7-ter long operands can be obtaned fro the expresson for 8-ters long operands by settng the ost sgnfcant ters of both polynoals to zero and so on. We obtaned the n-ter Wnograd MM forulae n the sae way. The seral ultplers have the sae structure (see Fg. ). The gate coplexty of these ultplers does not depend on the appled MM and s approxately the sae usng the sae segentaton and the sae optzed partal ultpler (see eq. ). The area of those ultplers s also approxately the sae. Ther energy consupton depends on the nuber of clock cycles, whch are necessary to accuulate the polynoal product. The nuber of clock cycles s equal to the nuber of partal ultplcatons #MULT of the appled MM (see eq.5) and s gven n table IV for each of the nvestgated MMs and also for the classcal MM. The forer deternes the energy consupton of ultplers,.e. t depends on the appled MM and s shown n Table V for each nvestgated MM. TABLE V. NUMBER OF CLOCKS FOR INVESTIGATED MM Investgated MM Nuber of clocks for the segentaton n n n3 n4 n5 n6 n7 classcal 4 9 6 5 - - n-ter Karatsuba 3 6 9 4 8 3 n-ter Wnograd 3 6 9 4 8 3 generalzed n-ter Karatsuba 3 6 5 4 n-ter Montgoery - - - 3 7 For all nvestgated ultplers we calculated the area and energy consupton usng ther gate coplexty and the gate paraeters of the IHP.3µ technology. In addton we calculated the gate coplexty of the seral ultpler that calculates the product usng the classcal MM. Such a ultpler doesn t contan the unt for calculatng the nputs of the partal ultpler. Also ts accuulaton unts are less coplex than those needed for other MMs (see Fg. 4). We calculate the gate coplexty of the seral classcal ultpler usng (): clas n TM + (n ) MUX + () + ((465 + n) ;(465 + n ) ) Fg. 5 shows the calculated area and energy consupton as a functon of clock cycles requred by each polynoal ultpler for the accuulaton of the product. The red curve shows the energy consupton of all nvestgated ultplers and the green curve shows ther area. The area and energy consupton of the optzed one clock 33-bts ultpler are gven for coparson. Also the seral ultpler presented n [] s shown n Fg.5. Fgure 5. Calculated chp-paraeters of all nvestgated seral polynoal ultplers As selecton crteron for the best ultpler we selected not only the area-power product but also the executon te of the ultplcaton. We deterned canddates for pleentaton: the 6-clock cycles ultpler usng 3-ter- Wnograd MM (36 pj) and the 9-clock cycles ultpler usng 4-ter-Karatsuba MM (3 pj). Note that the area of other -clock ultplers (>9) s argnally saller than the area of the 9-clock cycles ultpler but ther energy consupton, area-power product and calculaton te are uch larger. For coparson, the 9-clock cycles ultpler presented n [] has an areapower product of 37 pj.

For the evaluaton of our theoretcal results we syntheszed the 6- and 9-clock cycles seral ultplers for the IHP.3µ technology usng Synopss tools [4]. Addtonally we syntheszed the desgn fro [] as benchark for our practcal results. Table VI shows the area and energy consupton of the syntheszed polynoal ultplers and the chp paraeters of the kp-desgns usng these polynoal ultplers. TABLE VI. PARAMETERS OF SYNTHESIZED DESIGNS Operaton Desgn detals, bts area, Energy 9-clock ultpler (as n []) 64.4.5 nj polynoal 9-clock ultpler (ths work) 59.9.95 nj ultplcaton 6-clockultpler (ths work) 78.33.7 nj usng 9-clock ultpler ( fro []) 64.48.9 µj kp usng 9-clock ultpler (ths work) 59.43.4 µj usng 6-clock ultpler (ths work) 78.66.74 µj The coparson of the chp-paraeters of syntheszed ultplers and kp-desgns reveals: - Applyng the optzed 59-bts partal ultpler nstead of the 64-bt partal ultpler n the 9-clock cycles seral polynoal ultpler reduces the area of the ultpler about 4.4% and at the sae te the energy consupton about 9.5%. - Applyng our seral 6-clock cycles polynoal ultpler nstead of the 9-clock ultpler fro [] reduces the energy consupton of a sngle polynoal ultplcaton about 3.4%. The energy consupton of the full kp-desgns and the executon te of the kp-operaton are reduced about 4% and 8%, respectvely. [4] Sunar, B.: A Generalzed Method for Constructng Subquadratc Coplexty GF( k ) Multplers. IEEE Transactons on Coputers, Vol. 53, Nr. 9, 4, pp: 97-5 [5] Montgoery, P.L.: Fve, Sx, and Seven-Ter Karatsuba-Lke Forulae. IEEE Transactons on Coputers, Vol. 54, Nr. 3, 5, pp: 36-369 [6] Hanng Fan and M. Anwar Hasan. Coents on Fve, sx, and seventer Karatsuba-lke forulae. IEEE Trans. Coput., 56 (5), 7, pp:76 77 [7] Cenk, M., Koc, C. K., Ozbudak, F.: Polynoal ultplcaton over fnte felds usng feld extensons and nterpolaton, n 9th IEEE Syposu on Coputer Arthetc, IEEE Coputer Socety Press, Portland, Oregon, 9, pp: 84-9. [8] Oseledets, I.: Iproved n-ter Karatsuba-lke Forulae n GF(). IEEE Transactons on Coputers, Vol. 6, Nr. 8,, pp: -6 [9] J. von zur Gathen and J. Shokrollah: Effcent FPGA-based Karatsuba ultplers for polynoals over F, n Selected Areas n Cryptography (SAC) th Internatonal Workshop, Canada, 5, Revsed Selected Papers, Vol. 3897 of LNCS, Sprnger-Verlag, 6, pp: 359 369 [] Dyka, Z., Langendoerfer, P.: Algorthcally deternng the optu polynoal ultpler n GF( k ), extended abstract, WEWORC, avalable at http://.weworc.org/ [] Dyka, Z., Langendoerfer, P.: Area effcent hardware pleentaton of ellptc curve cryptography by teratvely applyng Karatsubas ethod, In Proceedngs of the Desgn, Autoaton and Test n Europe (DATE 5), 5, Vol.3, pp: 7-75 [] Peter, S., Langendoerfer, P.: An Effcent Polynoal Multpler GF() and ts Applcaton to ECC Desgns. In Proceedngs of the Desgn, Autoaton and Test n Europe (DATE 7), 7, pp:53-58 [3] Innovatons for Hgh Perforance Mcroelectroncs, http://www.hpcroelectroncs.co/ [4] Synopss, http://www.synopsys.co/ CONCLUSION In ths paper we have presented a theoretcal approach to do early assessng of the area and power consupton of hardware accelerators for ellptc curve cryptography. We developed one clock ultplers for up to 6 bts long operands usng optal cobnatons of 3 ultplcaton ethods. We have calculated the area and energy consupton of 5 seral polynoal ultplers for the ultplcaton n GF( 33 ) that we use as an exaple. Ths approach can be appled also for the developent of a seral ultpler for other operand length. We used the results to select three ultplers for further evaluaton. These ultplers were syntheszed and the synthess values are gven n the artcle. The coparson of the theoretcal calculated data and the syntheszed values shows that our approach provdes a good and relable eans to do an early assessent for selectng pleentaton canddates well suted for resource constrant, low cost devces such as ebedded systes or wreless sensor nodes. REFERENCES [] Weerskrch, A., Paar, C.: Generalzatons of the Karatsuba Algorth for Effcent Ipleentatons. Report 6/4, Cryptology eprnt Archve, 6, http://eprnt.acr.org/6/4.pdf [] Karatsuba, A., Ofan, Y.: Multplcaton of Many-Dgtal Nubers by Autoatc Coputers. Doklady Akad. Nauk SSSR, Vol. 45 (96), pp: 93 94. Translaton n Physcs-Doklady, 7 (963), pp: 595 596. [3] Wnograd, S.: Arthetc Coplexty of Coputatons. SIAM, 98