Table-Based Polynomials for Fast Hardware Function Evaluation
|
|
- Mervyn Cummings
- 6 years ago
- Views:
Transcription
1 ASAP 05 Table-Based Polynomials for Fast Hardware Function Evaluation Jérémie Detrey Florent de Dinechin Projet Arénaire LIP UMR CNRS ENS Lyon UCB Lyon INRIA CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE ECOLE NORMALE SUPERIEURE DE LYON
2 Overview 1 Context The HOTBM method Results Conclusion Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 1 / 34
3 Context 2 Context The HOTBM method Results Conclusion Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 2 / 34
4 Context: function evaluation 3 fixed-point elementary functions sin(x), cos(x), log(x), e x,... signal or image processing neural networks dedicated computations logarithmic number system: log 2 (1 + 2 x ) and log 2 (1 2 x )... Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 3 / 34
5 Context: function evaluation 3 fixed-point elementary functions sin(x), cos(x), log(x), e x,... signal or image processing neural networks dedicated computations logarithmic number system: log 2 (1 + 2 x ) and log 2 (1 2 x )... X w I w O f(x) usually w I = w O and 8 w I, w O 32 Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 3 / 34
6 Context: function evaluation 3 fixed-point elementary functions sin(x), cos(x), log(x), e x,... signal or image processing neural networks dedicated computations logarithmic number system: log 2 (1 + 2 x ) and log 2 (1 2 x )... X w I? w O f(x) usually w I = w O and 8 w I, w O 32 Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 3 / 34
7 Order 0: direct look-up table 4 tabulate all the possible values X w I f(0) f(1).. f(2 w I 2) f(2 w I 1) w O f(x) very short critical path: only 1 table look-up huge look-up table: w O 2 w I bits Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 4 / 34
8 Order 1: lookup-multiply method 5 piecewise linear approximation K 0 ( A) w O + g X w I A K 1 ( A) w O + g w O f(x) B smaller tables longer critical path: 1 table look-up, 1 mult and 1 add Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 5 / 34
9 Order 1: bipartite table method [2] 6 tabulate the product in a table of offsets (TO) TIV( A) w O + g A w O f(x) X w I A 0 TO(, B) A 0 w O + g B shorter critical path: 1 table look-up and 1 add slightly larger tables Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 6 / 34
10 Order 1: multipartite table method [14,11,4] 7 split the linear offset (TO) as a sum of several offsets (TO i s) X w I A TIV w O + g A 0 B B 0 A 1 O X R TO 0 O X R w O + g w O f(x) B 1 O X R TO 1 O X R w O + g B 2 A 2 O X R TO 2 O X R w O + g critical path: 2 XOR stages, 1 table look-up and log 2 (n) adds much smaller tables, but adder tree Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 7 / 34
11 Order 2: SMSO method [6] 8 split the order-1 term as the sum of a small product and an offset X w I A TIV w O + g 0 B A 0 TS w O + g 1 w O + g 0 B 0 w O f(x) A 1 B 1 O X R TO 1 O X R w O + g 0 A 2 B 2 O X R TO 2 w O + g 0 critical path: 1 table look-up, 1 rectangular mult and 2 adds multiplier, but smaller tables Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 8 / 34
12 Higher order methods 9 Hörner evaluation interleaved memory interpolators: Lewis partial product arrays: Hassler and Takagi specialized squaring unit: Piñero, Bruguera and Muller this work Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 9 / 34
13 Objectives 10 higher order approximation for larger precisions and smaller tables accurate error analysis to help the optimization of the hardware cost split large operators into smaller ones for architectural exploration Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 10 / 34
14 The HOTBM method (Higher-Order Table-Based Method) 11 Context The HOTBM method Results Conclusion Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 11 / 34
15 Polynomial approximation 12 1 f(x) Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 12 / 34
16 Polynomial approximation 12 1 f(x) 0 0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1 input word decomposition: X = A + 2 α B =.a 1 a 2 a α b 1 b 2 b β w I A B α β Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 12 / 34
17 Polynomial approximation 12 1 f(x) 0 0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1 input word decomposition: X = A + 2 α B =.a 1 a 2 a α b 1 b 2 b β w I A B α β piecewise order-n minimax polynomial approximation: n f(x) P (A)(B) = K k (A) (2 α B) k k=0 Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 12 / 34
18 Polynomial approximation: architecture 13 X w I A K 0 ( A) w O + g B K 1 ( A) 2 α B w O + g w O f(x) K 2 ( A) ( 2 α B) 2 w O + g. K n ( A) ( 2 α B ) n w O + g Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 13 / 34
19 Polynomial approximation: architecture 13 X w I A K 0 ( A) w O + g B? K 1 ( A) 2 α B w O + g w O f(x)? K 2 ( A) ( 2 α B) 2 w O + g.? K n ( A) ( 2 α B ) n w O + g architectural choices to implement each term T k (A, B) = K k (A) (2 α B) k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 13 / 34
20 Computing the terms: exploiting symmetry 14 each term T k (A, B) is symmetric with respect to the middle of each sub-interval: when k is even, T k (A, B) = T k (A, B): B < 0 B > 0 A B T k ( A, B) when k is odd, T k (A, B) = T k (A, B): B < 0 B > 0 A B T k ( A, B) Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 14 / 34
21 Computing the terms: exploiting symmetry 14 each term T k (A, B) is symmetric with respect to the middle of each sub-interval: when k is even, T k (A, B) = T k (A, B): B < 0 B > 0 A b 1 B B T k ( A, B ) when k is odd, T k (A, B) = T k (A, B): B < 0 B > 0 A b 1 B B T k ( A, B ) Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 14 / 34
22 Computing the terms: simple look-up table 15 tabulate all the possible values A b 1 B T k ( A, B ) Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 15 / 34
23 Computing the terms: power-and-multiply 16 compute S k = B k with a powering unit split S k into several sub-words S k,1,..., S k,mk : k (β 1) S k,1 S k,2... S k,mk σ k,1 σ k,2 σ k,mk compute the product K k (A) S k K k (A) S k,j : as the sum of all the sub-products the most significant ones implemented as actual multipliers the least significant ones implemented as look-up tables exploit symmetry for each of those sub-products Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 16 / 34
24 Computing the terms: power-and-multiply 17 A K k ( A) b 1 S k,1 B k B..... K k ( A) S k,2. S k,mk K k ( A) S k,m k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 17 / 34
25 Computing the terms: power-and-multiply 17 A K k ( A) S b 1? k,1 k O X B. R B.... K k ( A) S k,2. S k,mk K k ( A) S k,m k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 17 / 34
26 Computing the terms: powering unit 18 implemented as a look-up table Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 18 / 34
27 Computing the terms: powering unit 18 implemented as a look-up table implemented as a sum of partial products B partial products S k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 18 / 34
28 Degrading accuracy 19 T 0 ( A) = K 0 ( A) α K 1 ( A) S 1,1 K 1 ( A) S 1,2 K 1 ( A) T 1 ( A, B) = K 1 ( A) 2 α B S 1,3 2α. f(x). K 2 ( A) K 2 ( A) S 2,1 T2 ( A, B) = K 2 ( A) ( 2 α B) 2 S 2,2. some of the terms are more accurate than others Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 19 / 34
29 Degrading accuracy 19 T 0 ( A) = K 0 ( A) α K 1 ( A) S 1,1 K 1 ( A) S 1,2 K 1 ( A) T 1 ( A, B) = K 1 ( A) 2 α B S 1,3 2α. f(x). K 2 ( A) K 2 ( A) S 2,1 T2 ( A, B) = K 2 ( A) ( 2 α B) 2 S 2,2. some of the terms are more accurate than others Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 19 / 34
30 Degrading accuracy 19 T 0 ( A) = K 0 ( A) α K 1 ( A) S 1,1 K 1 ( A) S 1,2 K 1 ( A) T 1 ( A, B) = K 1 ( A) 2 α B S 1,3 2α. f(x). K 2 ( A) K 2 ( A) S 2,1 T2 ( A, B) = K 2 ( A) ( 2 α B) 2 S 2,2. some of the terms are more accurate than others we can save area by using less bits to compute the most accurate tables Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 19 / 34
31 Degrading accuracy: global architecture 20 each term T k is computed using only: A k, the α k most significant bits of A B k, the β k most significant bits of B w I A k α k B k β k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 20 / 34
32 Degrading accuracy: global architecture 21 X w I A A 0 T 0 ( ) A 0 w O + g A 1 B B 1 T 1 ( A 1, B 1 ) w O + g A 2 w O f(x) B 2 T 2 ( A 2, B 2 ). w O + g A 3 B n T n ( A n, B n ) w O + g Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 21 / 34
33 Degrading accuracy: global architecture 21 X w I A A 0 T 0 ( ) A 0 w O + g A 1 B B 1 T 1 ( A 1, B 1 ) w O + g A 2 w O f(x) B 2 T 2 ( A 2, B 2 ). w O + g A 3 B n T n ( A n, B n ) w O + g Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 21 / 34
34 Degrading accuracy: power-and-multiply terms 22 only the λ k most significant bits of B k k are used for S k each sub-product K k (A k ) S k,j is computed using only A k,j, the α k,j most significant bits of A k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 22 / 34
35 Degrading accuracy: power-and-multiply terms 23 A k A k,1 K k ( ) A k,1 b 1 S k,1 B O X k R k B k..... A k,2 K k ( ) A k,2 S k,2. A k,mk S k,mk K k ( ) A k,mk S k,m k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 23 / 34
36 Degrading accuracy: power-and-multiply terms 23 A k A k,1 K k ( ) A k,1 b 1 S k,1 B O X k R k B k. S k, A k,2 K k ( ) A k,2 S k,2. S k,mk A k,mk K k ( ) A k,mk S k,m k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 23 / 34
37 Degrading accuracy: ad-hoc powering units 24 each ad-hoc powering unit is truncated to µ k bits B k S k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 24 / 34
38 Degrading accuracy: ad-hoc powering units 24 each ad-hoc powering unit is truncated to µ k bits B k S k S k Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 24 / 34
39 Error analysis 25 every error entailed by the operator is accurately bounded: minimax error method errors rounding errors we can easily compute g the number of guard bits required to ensure faithful rounding (last bit accuracy) a trial-and-error method is then applied to decrease g Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 25 / 34
40 Results 26 Context The HOTBM method Results Conclusion Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 26 / 34
41 Results: area estimations for log 2 (1 + x) 27 Operator area (in slices) 3000 FPGA area ratio order 2 SMSO order 3 50% % 1000 order % Input / output precision w I = w O (in bits) as expected, exponential growth order 2 up to 24 bits, order 3 up to 28 bits, order 4 up to 32 bits Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 27 / 34
42 Results: area estimations for sin x 28 Operator area (in slices) 3000 FPGA area ratio % order 2 SMSO 30% order 3 order 4 10% Input / output precision w I = w O (in bits) Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 28 / 34
43 Results: delay estimations for log 2 (1 + x) 29 Operator delay (in ns) order order 3 25 order 2 SMSO Input / output precision w I = w O (in bits) latency increase for higher orders Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 29 / 34
44 Results: delay estimations for sin x 30 Operator delay (in ns) order 4 35 order 3 30 SMSO 25 order Input / output precision w I = w O (in bits) Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 30 / 34
45 Conclusion 31 Context The HOTBM method Results Conclusion Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 31 / 34
46 Contribution 32 a novel function approximation method: arbitrary order: smaller tables optimized powering units small multipliers: shorter critical path, and can benefit from recent FPGA technologies (Virtex-II) highly parameterizable design, adaptable to various metrics accurate approximation and rounding error analysis targeted to precisions up to 32 bits Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 32 / 34
47 Future work 33 improve parameter space exploration heuristic following user-specified criteria adapt this method to ASIC (different metric, architectural choices,...) take advantage of accurate error analysis method to finely tune the tables Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 33 / 34
48 Future work 33 improve parameter space exploration heuristic following user-specified criteria adapt this method to ASIC (different metric, architectural choices,...) take advantage of accurate error analysis method to finely tune the tables work-in-progress: library of parameterizable floating-point operators for elementary functions: logarithm exponential Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 33 / 34
49 Thank you for your attention 34 more information: CVS repository: Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 34 / 34
50 Thank you for your attention 34 more information: CVS repository: Questions? Jérémie Detrey, Florent de Dinechin Table-Based Polynomials for Fast Hardware Function Evaluation 34 / 34
Second Order Function Approximation Using a Single Multiplication on FPGAs
FPL 04 Second Order Function Approximation Using a Single Multiplication on FPGAs Jérémie Detrey Florent de Dinechin Projet Arénaire LIP UMR CNRS ENS Lyon UCB Lyon INRIA 5668 http://www.ens-lyon.fr/lip/arenaire/
More informationTable-based polynomials for fast hardware function evaluation
Table-based polynomials for fast hardware function evaluation Jérémie Detrey, Florent de Dinechin LIP, École Normale Supérieure de Lyon 46 allée d Italie 69364 Lyon cedex 07, France E-mail: {Jeremie.Detrey,
More informationTable-based polynomials for fast hardware function evaluation
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Table-based polynomials for fast hardware function evaluation Jérémie
More informationAutomatic generation of polynomial-based hardware architectures for function evaluation
Automatic generation of polynomial-based hardware architectures for function evaluation Florent De Dinechin, Mioara Joldes, Bogdan Pasca To cite this version: Florent De Dinechin, Mioara Joldes, Bogdan
More informationHardware Operator for Simultaneous Sine and Cosine Evaluation
Hardware Operator for Simultaneous Sine and Cosine Evaluation Arnaud Tisserand To cite this version: Arnaud Tisserand. Hardware Operator for Simultaneous Sine and Cosine Evaluation. ICASSP 6: International
More informationFixed-Point Trigonometric Functions on FPGAs
Fixed-Point Trigonometric Functions on FPGAs Florent de Dinechin Matei Iştoan Guillaume Sergent LIP, Université de Lyon (CNRS/ENS-Lyon/INRIA/UCBL) 46, allée d Italie, 69364 Lyon Cedex 07 June 14th, 2013
More informationLaboratoire de l Informatique du Parallélisme. École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512
Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON n o 8512 SPI A few results on table-based methods Jean-Michel Muller October
More informationOptimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations
electronics Article Optimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations Masoud Sadeghian 1,, James E. Stine 1, *, and E. George Walters III 2, 1 Oklahoma
More informationEfficient Function Approximation Using Truncated Multipliers and Squarers
Efficient Function Approximation Using Truncated Multipliers and Squarers E. George Walters III Lehigh University Bethlehem, PA, USA waltersg@ieee.org Michael J. Schulte University of Wisconsin Madison
More informationAutomated design of floating-point logarithm functions on integer processors
23rd IEEE Symposium on Computer Arithmetic Santa Clara, CA, USA, 10-13 July 2016 Automated design of floating-point logarithm functions on integer processors Guillaume Revy (presented by Florent de Dinechin)
More informationNUMERICAL FUNCTION GENERATORS USING BILINEAR INTERPOLATION
NUMERICAL FUNCTION GENERATORS USING BILINEAR INTERPOLATION Shinobu Nagayama 1, Tsutomu Sasao 2, Jon T Butler 3 1 Department of Computer Engineering, Hiroshima City University, Japan 2 Department of Computer
More informationHardware implementations of fixed-point Atan2
Hardware implementations of fixed-point Atan2 Florent De Dinechin, Matei Istoan To cite this version: Florent De Dinechin, Matei Istoan. Hardware implementations of fixed-point Atan2. 22nd IEEE Symposium
More informationA Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series
A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12, V. Ariyarathna 3, D. F. G. Coelho 1, L. Rakai 1, A. Madanayake 3, R. J. Cintra 4 1 ECE Department,
More informationOn the number of segments needed in a piecewise linear approximation
On the number of segments needed in a piecewise linear approximation Christopher L. Frenzen a, Tsutomu Sasao b and Jon T. Butler c. a Department of Applied Mathematics, Naval Postgraduate School, Monterey,
More informationArithmetic Operators for Pairing-Based Cryptography
Arithmetic Operators for Pairing-Based Cryptography Jean-Luc Beuchat Laboratory of Cryptography and Information Security Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1
More informationReduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs
Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,
More informationArithmetic operators for pairing-based cryptography
7. Kryptotag November 9 th, 2007 Arithmetic operators for pairing-based cryptography Jérémie Detrey Cosec, B-IT, Bonn, Germany jdetrey@bit.uni-bonn.de Joint work with: Jean-Luc Beuchat Nicolas Brisebarre
More informationKaratsuba with Rectangular Multipliers for FPGAs
Karatsuba with Rectangular Multipliers for FPGAs Martin Kumm, Oscar Gustafsson, Florent De Dinechin, Johannes Kappauf, Peter Zipf To cite this version: Martin Kumm, Oscar Gustafsson, Florent De Dinechin,
More informationA Hardware-Oriented Method for Evaluating Complex Polynomials
A Hardware-Oriented Method for Evaluating Complex Polynomials Miloš D Ercegovac Computer Science Department University of California at Los Angeles Los Angeles, CA 90095, USA milos@csuclaedu Jean-Michel
More informationAutomated design of floating-point logarithm functions on integer processors
Automated design of floating-point logarithm functions on integer processors Guillaume Revy To cite this version: Guillaume Revy. Automated design of floating-point logarithm functions on integer processors.
More informationEfficient Polynomial Evaluation Algorithm and Implementation on FPGA
Efficient Polynomial Evaluation Algorithm and Implementation on FPGA by Simin Xu School of Computer Engineering A thesis submitted to Nanyang Technological University in partial fullfillment of the requirements
More informationComputation of the error functions erf and erfc in arbitrary precision with correct rounding
Computation of the error functions erf and erfc in arbitrary precision with correct rounding Sylvain Chevillard Arenaire, LIP, ENS-Lyon, France Sylvain.Chevillard@ens-lyon.fr Nathalie Revol INRIA, Arenaire,
More informationArithmetic Operators for Pairing-Based Cryptography
Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,
More informationMultivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA
Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &
More informationReturn of the hardware floating-point elementary function
Return of the hardware floating-point elementary function Jérémie Detrey, Florent De Dinechin, Xavier Pujol To cite this version: Jérémie Detrey, Florent De Dinechin, Xavier Pujol. Return of the hardware
More informationComputing Machine-Efficient Polynomial Approximations
Computing Machine-Efficient Polynomial Approximations N. Brisebarre, S. Chevillard, G. Hanrot, J.-M. Muller, D. Stehlé, A. Tisserand and S. Torres Arénaire, LIP, É.N.S. Lyon Journées du GDR et du réseau
More informationDesign Method for Numerical Function Generators Based on Polynomial Approximation for FPGA Implementation
Design Method for Numerical Function Generators Based on Polynomial Approximation for FPGA Implementation Shinobu Nagayama Tsutomu Sasao Jon T. Butler Dept. of Computer Engineering, Dept. of Computer Science
More informationDesign and Implementation of a Radix-4 Complex Division Unit with Prescaling
esign and Implementation of a Radix-4 Complex ivision Unit with Prescaling Pouya ormiani Computer Science epartment University of California at Los Angeles Los Angeles, CA 90024, USA Email: pouya@cs.ucla.edu
More informationLecture 11. Advanced Dividers
Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division
More informationEfficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination
Efficient Subquadratic Space Complexity Binary Polynomial Multipliers Based On Block Recombination Murat Cenk, Anwar Hasan, Christophe Negre To cite this version: Murat Cenk, Anwar Hasan, Christophe Negre.
More information9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017
9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin
More informationDesign and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives
Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny
More informationMATH 1231 MATHEMATICS 1B CALCULUS. Section 5: - Power Series and Taylor Series.
MATH 1231 MATHEMATICS 1B CALCULUS. Section 5: - Power Series and Taylor Series. The objective of this section is to become familiar with the theory and application of power series and Taylor series. By
More informationComputer Problems for Fourier Series and Transforms
Computer Problems for Fourier Series and Transforms 1. Square waves are frequently used in electronics and signal processing. An example is shown below. 1 π < x < 0 1 0 < x < π y(x) = 1 π < x < 2π... and
More informationRigorous Polynomial Approximations and Applications
Rigorous Polynomial Approximations and Applications Mioara Joldeș under the supervision of: Nicolas Brisebarre and Jean-Michel Muller École Normale Supérieure de Lyon, Arénaire Team, Laboratoire de l Informatique
More informationPart VI Function Evaluation
Part VI Function Evaluation Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations
More informationSection 5.8. Taylor Series
Difference Equations to Differential Equations Section 5.8 Taylor Series In this section we will put together much of the work of Sections 5.-5.7 in the context of a discussion of Taylor series. We begin
More informationComplex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms
Complex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms David Lewis Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S
More informationECE 645: Lecture 3. Conditional-Sum Adders and Parallel Prefix Network Adders. FPGA Optimized Adders
ECE 645: Lecture 3 Conditional-Sum Adders and Parallel Prefix Network Adders FPGA Optimized Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 7.4, Conditional-Sum
More informationf (x) = k=0 f (0) = k=0 k=0 a k k(0) k 1 = a 1 a 1 = f (0). a k k(k 1)x k 2, k=2 a k k(k 1)(0) k 2 = 2a 2 a 2 = f (0) 2 a k k(k 1)(k 2)x k 3, k=3
1 M 13-Lecture Contents: 1) Taylor Polynomials 2) Taylor Series Centered at x a 3) Applications of Taylor Polynomials Taylor Series The previous section served as motivation and gave some useful expansion.
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationA 32-bit Decimal Floating-Point Logarithmic Converter
A 3-bit Decimal Floating-Point Logarithmic Converter Dongdong Chen 1, Yu Zhang 1, Younhee Choi 1, Moon Ho Lee, Seok-Bum Ko 1, Department of Electrical and Computer Engineering, University of Saskatchewan
More informationProposal to Improve Data Format Conversions for a Hybrid Number System Processor
Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation
More informationA COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte
A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic
More informationSemi-Automatic Floating-Point Implementation of Special Functions
Semi-Automatic Floating-Point Implementation of Special Functions Christoph Lauter 1 Marc Mezzarobba 1,2 Pequan group 1 Université Paris 6 2 CNRS ARITH 22, Lyon, 2015-06-23 }main() { int temp; float celsius;
More informationCost/Performance Tradeoff of n-select Square Root Implementations
Australian Computer Science Communications, Vol.22, No.4, 2, pp.9 6, IEEE Comp. Society Press Cost/Performance Tradeoff of n-select Square Root Implementations Wanming Chu and Yamin Li Computer Architecture
More informationA technique for DDA seed shifting and scaling
A technique for DDA seed shifting and scaling John Kerl Feb 8, 2001 Abstract This paper describes a simple, unified technique for DDA seed shifting and scaling. As well, the terms DDA, seed, shifting and
More informationOn-Line Hardware Implementation for Complex Exponential and Logarithm
On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP
More informationMath Practice Exam 3 - solutions
Math 181 - Practice Exam 3 - solutions Problem 1 Consider the function h(x) = (9x 2 33x 25)e 3x+1. a) Find h (x). b) Find all values of x where h (x) is zero ( critical values ). c) Using the sign pattern
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com of SubBytes and InvSubBytes s of AES Algorithm Using Power Analysis Attack Resistant Reversible
More informationNewton-Raphson Algorithms for Floating-Point Division Using an FMA
Newton-Raphson Algorithms for Floating-Point Division Using an FMA Nicolas Louvet, Jean-Michel Muller, Adrien Panhaleux Abstract Since the introduction of the Fused Multiply and Add (FMA) in the IEEE-754-2008
More informationA Deep Convolutional Neural Network Based on Nested Residue Number System
A Deep Convolutional Neural Network Based on Nested Residue Number System Hiroki Nakahara Tsutomu Sasao Ehime University, Japan Meiji University, Japan Outline Background Deep convolutional neural network
More informationPractice Problems: Integration by Parts
Practice Problems: Integration by Parts Answers. (a) Neither term will get simpler through differentiation, so let s try some choice for u and dv, and see how it works out (we can always go back and try
More informationOptimizing Scientific Libraries for the Itanium
0 Optimizing Scientific Libraries for the Itanium John Harrison Intel Corporation Gelato Federation Meeting, HP Cupertino May 25, 2005 1 Quick summary Intel supplies drop-in replacement versions of common
More informationNumbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture
Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose
More informationNUMERICAL MATHEMATICS & COMPUTING 6th Edition
NUMERICAL MATHEMATICS & COMPUTING 6th Edition Ward Cheney/David Kincaid c UT Austin Engage Learning: Thomson-Brooks/Cole www.engage.com www.ma.utexas.edu/cna/nmc6 September 1, 2011 2011 1 / 42 1.1 Mathematical
More informationNUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.
NUMERICAL METHODS 1. Rearranging the equation x 3 =.5 gives the iterative formula x n+1 = g(x n ), where g(x) = (2x 2 ) 1. (a) Starting with x = 1, compute the x n up to n = 6, and describe what is happening.
More informationFPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials
FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers
More informationAn Algorithm for the η T Pairing Calculation in Characteristic Three and its Hardware Implementation
An Algorithm for the η T Pairing Calculation in Characteristic Three and its Hardware Implementation Jean-Luc Beuchat 1 Masaaki Shirase 2 Tsuyoshi Takagi 2 Eiji Okamoto 1 1 Graduate School of Systems and
More informationPower Series Solutions We use power series to solve second order differential equations
Objectives Power Series Solutions We use power series to solve second order differential equations We use power series expansions to find solutions to second order, linear, variable coefficient equations
More informationOptimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,
More informationWord-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator
Word-length Optimization and Error Analysis of a Multivariate Gaussian Random Number Generator Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical & Electronic
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationContinued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic.
Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic. Mourad Gouicem PEQUAN Team, LIP6/UPMC Nancy, France May 28 th 2013
More informationWhat s the Deal? MULTIPLICATION. Time to multiply
What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products
More information1 Short adders. t total_ripple8 = t first + 6*t middle + t last = 4t p + 6*2t p + 2t p = 18t p
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Study Homework: Arithmetic NTU IC54CA (Fall 2004) SOLUTIONS Short adders A The delay of the ripple
More information1. Use the properties of exponents to simplify the following expression, writing your answer with only positive exponents.
Math120 - Precalculus. Final Review. Fall, 2011 Prepared by Dr. P. Babaali 1 Algebra 1. Use the properties of exponents to simplify the following expression, writing your answer with only positive exponents.
More informationProposal to Improve Data Format Conversions for a Hybrid Number System Processor
Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor
More informationGal s Accurate Tables Method Revisited
Gal s Accurate Tables Method Revisited Damien Stehlé UHP/LORIA 615 rue du jardin botanique F-5460 Villers-lès-Nancy Cedex stehle@loria.fr Paul Zimmermann INRIA Lorraine/LORIA 615 rue du jardin botanique
More informationFast and accurate Bessel function computation
0 Fast and accurate Bessel function computation John Harrison, Intel Corporation ARITH-19 Portland, OR Tue 9th June 2009 (11:00 11:30) 1 Bessel functions and their computation Bessel functions are certain
More informationThis is your first impression to me as a mathematician. Make it good.
Calculus Summer 2016 DVHS (AP or RIO) Name : Welcome! Congratulations on reaching this advanced level of mathematics. Calculus is unlike the mathematics you have already studied, and yet it is built on
More informationLow-complexity generation of scalable complete complementary sets of sequences
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Low-complexity generation of scalable complete complementary sets
More informationAuthor(s) Beuchat, Jean-Luc; Muller, Jean-Mic. Citation IEEE transactions on computers, 57(
Title Automatic Generation of Modular Mul Applications Author(s) Beuchat, Jean-Luc; Muller, Jean-Mic Citation IEEE transactions on computers, 57( Issue Date 008-1 Text version publisher URL http://hdl.handle.net/41/101169
More informationCHALLENGE! (0) = 5. Construct a polynomial with the following behavior at x = 0:
TAYLOR SERIES Construct a polynomial with the following behavior at x = 0: CHALLENGE! P( x) = a + ax+ ax + ax + ax 2 3 4 0 1 2 3 4 P(0) = 1 P (0) = 2 P (0) = 3 P (0) = 4 P (4) (0) = 5 Sounds hard right?
More informationOptimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves
CT-RSA 2012 February 29th, 2012 Optimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves Joint work with: Nicolas Estibals CARAMEL project-team, LORIA, Université de Lorraine / CNRS / INRIA,
More informationMath 12 Final Exam Review 1
Math 12 Final Exam Review 1 Part One Calculators are NOT PERMITTED for this part of the exam. 1. a) The sine of angle θ is 1 What are the 2 possible values of θ in the domain 0 θ 2π? 2 b) Draw these angles
More informationChapter 2 Algorithms for Periodic Functions
Chapter 2 Algorithms for Periodic Functions In this chapter we show how to compute the Discrete Fourier Transform using a Fast Fourier Transform (FFT) algorithm, including not-so special case situations
More informationComputer Architecture 10. Fast Adders
Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance
More information7.0: Minimax approximations
7.0: Minimax approximations In this section we study the problem min f p = min max f(x) p(x) p A p A x [a,b] where f C[a, b] and A is a linear subspace of C[a, b]. Let p be a trial solution (e.g. a guess)
More informationLRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation
LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationSolution of Algebric & Transcendental Equations
Page15 Solution of Algebric & Transcendental Equations Contents: o Introduction o Evaluation of Polynomials by Horner s Method o Methods of solving non linear equations o Bracketing Methods o Bisection
More informationImplementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System
Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System
More informationConstruction of a reconfigurable dynamic logic cell
PRAMANA c Indian Academy of Sciences Vol. 64, No. 3 journal of March 2005 physics pp. 433 441 Construction of a reconfigurable dynamic logic cell K MURALI 1, SUDESHNA SINHA 2 and WILLIAM L DITTO 3 1 Department
More informationHardware implementations of ECC
Hardware implementations of ECC The University of Electro- Communications Introduction Public- key Cryptography (PKC) The most famous PKC is RSA and ECC Used for key agreement (Diffie- Hellman), digital
More informationDesigning a Correct Numerical Algorithm
Intro Implem Errors Sollya Gappa Norm Conc Christoph Lauter Guillaume Melquiond March 27, 2013 Intro Implem Errors Sollya Gappa Norm Conc Outline 1 Introduction 2 Implementation theory 3 Error analysis
More informationStep 1: Greatest Common Factor Step 2: Count the number of terms If there are: 2 Terms: Difference of 2 Perfect Squares ( + )( - )
Review for Algebra 2 CC Radicals: r x p 1 r x p p r = x p r = x Imaginary Numbers: i = 1 Polynomials (to Solve) Try Factoring: i 2 = 1 Step 1: Greatest Common Factor Step 2: Count the number of terms If
More informationHardware Acceleration of the Tate Pairing in Characteristic Three
Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)
More informationComputing Machine-Efficient Polynomial Approximations
Computing Machine-Efficient Polynomial Approximations NICOLAS BRISEBARRE Université J. Monnet, St-Étienne and LIP-E.N.S. Lyon JEAN-MICHEL MULLER CNRS, LIP-ENS Lyon and ARNAUD TISSERAND INRIA, LIP-ENS Lyon
More informationA HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *
Copyright IEEE 999: Published in the Proceedings of Globecom 999, Rio de Janeiro, Dec 5-9, 999 A HIGH-SPEED PROCESSOR FOR RECTAGULAR-TO-POLAR COVERSIO WITH APPLICATIOS I DIGITAL COMMUICATIOS * Dengwei
More informationHARDWARE IMPLEMENTATION OF FIR/IIR DIGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION. Arash Ardakani, François Leduc-Primeau and Warren J.
HARWARE IMPLEMENTATION OF FIR/IIR IGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION Arash Ardakani, François Leduc-Primeau and Warren J. Gross epartment of Electrical and Computer Engineering McGill
More informationTunable Floating-Point for Energy Efficient Accelerators
Tunable Floating-Point for Energy Efficient Accelerators Alberto Nannarelli DTU Compute, Technical University of Denmark 25 th IEEE Symposium on Computer Arithmetic A. Nannarelli (DTU Compute) Tunable
More informationfunction independent dependent domain range graph of the function The Vertical Line Test
Functions A quantity y is a function of another quantity x if there is some rule (an algebraic equation, a graph, a table, or as an English description) by which a unique value is assigned to y by a corresponding
More information8.5 Taylor Polynomials and Taylor Series
8.5. TAYLOR POLYNOMIALS AND TAYLOR SERIES 50 8.5 Taylor Polynomials and Taylor Series Motivating Questions In this section, we strive to understand the ideas generated by the following important questions:
More informationChapter 1 Numerical approximation of data : interpolation, least squares method
Chapter 1 Numerical approximation of data : interpolation, least squares method I. Motivation 1 Approximation of functions Evaluation of a function Which functions (f : R R) can be effectively evaluated
More informationPolynomial Functions and Their Graphs
Polynomial Functions and Their Graphs Definition of a Polynomial Function Let n be a nonnegative integer and let a n, a n- 1,, a 2, a 1, a 0, be real numbers with a n 0. The function defined by f (x) a
More informationECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders
ECE 645: Lecture 2 Carry-Lookahead, Carry-Select, & Hybrid Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 6, Carry-Lookahead Adders Sections 6.1-6.2.
More informationy = 5 x. Which statement is true? x 2 6x 25 = 0 by completing the square?
Algebra /Trigonometry Regents Exam 064 www.jmap.org 064a Which survey is least likely to contain bias? ) surveying a sample of people leaving a movie theater to determine which flavor of ice cream is the
More informationTHIS paper is devoted to the study of modular multiplication
AUTOATIC GENERATION OF OULAR ULTIPLIERS FOR FPGA APPLICATIONS Automatic Generation of odular ultipliers for FPGA Applications Jean-Luc Beuchat and Jean-ichel uller, Senior ember, IEEE LIP Research Report
More informationJanus: FPGA Based System for Scientific Computing Filippo Mantovani
Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass
More informationAES [and other Block Ciphers] Implementation Tricks
AES [and other Bloc Ciphers] Implementation Trics Cryptographic algorithms Basic primitives Survey by Stephen et al, LNCS 1482, Sep. 98 General Structure of a Bloc Cipher Useful Properties for Implementing
More information