Implementing Fast Carryless Multiplication

Size: px
Start display at page:

Download "Implementing Fast Carryless Multiplication"

Transcription

1 Implementing Fast Carryless Multiplication Joris van der Hoeven, Robin Larrieu and Grégoire Lecerf CNRS & École polytechnique MACIS 2017 Nov. 15, Vienna, Austria van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

2 Introduction Outline Introduction Carryless multiplication State of the art Presentation of the algorithm Implementation details Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

3 Introduction Carryless multiplication Carryless multiplication Multiplication in F 2 [X ], large degree (typically 10 6 ). Fast algorithms for such sizes use FFT multiplication. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

4 Introduction Carryless multiplication Carryless multiplication Multiplication in F 2 [X ], large degree (typically 10 6 ). Fast algorithms for such sizes use FFT multiplication. Problem Not many evaluation points in F 2 work in an extension field. How to minimize the corresponding overhead? van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

5 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

6 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) 2. FFT over F 2 60 (Harvey, van der Hoeven, Lecerf 2016) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

7 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) 2. FFT over F 2 60 (Harvey, van der Hoeven, Lecerf 2016) 3. Additive FFT over F or F (Chen, Cheng, Kuo, Li, Yang 2017) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

8 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) 2. FFT over F 2 60 (Harvey, van der Hoeven, Lecerf 2016) 3. Additive FFT over F or F (Chen, Cheng, Kuo, Li, Yang 2017) This work 1 Improvement of strategy n. o 2 using the ideas from the Frobenius FFT algorithm (van der Hoeven, Larrieu 2017). Achieves a speedup by a factor 2. 1 Source code available from revision of our SVN server ( in the justinline library van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

9 Introduction State of the art Why F 2 60? Efficient arithmetic in F 2 60 Slightly smaller than a machine word µ(x ) := X 61 1 X 1 irreducible over F 2 Efficient FFT Roots of unity with order = van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

10 Introduction State of the art Why F 2 60? Efficient arithmetic in F 2 60 Slightly smaller than a machine word µ(x ) := X 61 1 X 1 irreducible over F 2 Efficient FFT Roots of unity with order = Bonus 61 divides (Fermat s theorem) 2 generates (Z/61Z) ( µ(x ) irreducible) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

11 Presentation of the algorithm Outline Introduction Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Our variant of the Frobenius FFT Frobenius encoding Implementation details Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

12 Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Kronecker segmentation vs. Frobenius FFT Naive strategy F 2 [X ] <n F 2 60[X ] <n A F 2 [X ] B F 2 [X ] Ã F 2 60[X ] B F 2 60[X ] AB F 2 60[X ] AB F 2 [X ] van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

13 Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Kronecker segmentation vs. Frobenius FFT Naive strategy F 2 [X ] <n F 2 60[X ] <n A F 2 [X ] B F 2 [X ] Ã F 2 60[X ] B F 2 60[X ] AB F 2 60[X ] AB F 2 [X ] Kronecker segmentation F 2 [X ] <n F 2 [X ] <30 [Z] <n/30 F 2 60[Z] <n/30 A F 2 [X ] B F 2 [X ] Ã F 2 [X ] <30 [Z] B F 2 [X ] <30 [Z] AB F 2 [X ] <60 [Z] Z = X 30 AB F 2 [X ] van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

14 Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Kronecker segmentation vs. Frobenius FFT Naive strategy F 2 [X ] <n F 2 60[X ] <n A F 2 [X ] B F 2 [X ] Ã F 2 60[X ] B F 2 60[X ] AB F 2 60[X ] AB F 2 [X ] Kronecker segmentation F 2 [X ] <n F 2 [X ] <30 [Z] <n/30 F 2 60[Z] <n/30 A F 2 [X ] B F 2 [X ] Ã F 2 [X ] <30 [Z] B F 2 [X ] <30 [Z] AB F 2 [X ] <60 [Z] Z = X 30 AB F 2 [X ] Frobenius FFT For ω a root of unity, φ : x x 2 acts on {1, ω, ω 2, ω 3,... }. The naive DFT A [A(1), A(ω), A(ω 2 ),... ] causes redundant computation: A F 2 [X ], x F 2 60 A(x 2 ) = A(x) 2 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

15 Presentation of the algorithm Our variant of the Frobenius FFT Our variant of the Frobenius FFT A F 2 [X ] <60m DFT ω [A(ω 61i )] [A(ω 61i+1 )] [A(ω 61i+2 )] [A(ω 61i+3 )] [A(ω 61i+4 )] 61 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

16 Presentation of the algorithm Our variant of the Frobenius FFT Our variant of the Frobenius FFT A F 2 [X ] <60m DFT ω φ φ φ [A(ω 61i )] [A(ω 61i+1 )] [A(ω 61i+2 )] [A(ω 61i+3 )] [A(ω 61i+4 )] 61 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

17 Presentation of the algorithm Our variant of the Frobenius FFT Our variant of the Frobenius FFT A F 2 [X ] <60m Ā F 2 60[X ] <m Frobenius encoding DFT ω DFT ω φ φ φ [A(ω 61i )] [A(ω 61i+1 )] [A(ω 61i+2 )] [A(ω 61i+3 )] [A(ω 61i+4 )] 61 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

18 Presentation of the algorithm Our variant of the Frobenius FFT Multiplication algorithm A F 2 [X ] <a Frobenius Encoding Ā F 2 60[X ] <a/60 B F 2 60[X ] <b/60 DFT ω E ω (A) F m 2 60 a + b < 60m 61m divides AB F 2 [X ] <a+b van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

19 Presentation of the algorithm Our variant of the Frobenius FFT Multiplication algorithm A F 2 [X ] <a Frobenius Encoding Ā F 2 60[X ] <a/60 DFT ω E ω (A) F m 2 60 B F 2 [X ] <b Frobenius Encoding B F 2 60[X ] <b/60 DFT ω E ω (B) F m 2 60 a + b < 60m 61m divides AB F 2 [X ] <a+b van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

20 Presentation of the algorithm Our variant of the Frobenius FFT Multiplication algorithm A F 2 [X ] <a Frobenius Encoding Ā F 2 60[X ] <a/60 DFT ω B F 2 [X ] <b Frobenius Encoding B F 2 60[X ] <b/60 DFT ω E ω (A) F m 2 60 a + b < 60m 61m divides pointwise product E ω (AB) F m 2 60 DFT 1 ω AB F 2 60[X ] <m Frobenius Decoding AB F 2 [X ] <a+b E ω (B) F m 2 60 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

21 Presentation of the algorithm Frobenius encoding Frobenius encoding Cooley-Tukey FFT A(ω 61i+1 ) = k<m ω k l<61 a k+ml ω ml ω 61ki van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

22 Presentation of the algorithm Frobenius encoding Frobenius encoding Cooley-Tukey FFT A(ω 61i+1 ) = k<m ω k l<61 a k+ml ω ml ω 61ki θ := ω m, ω := ω 61 à k := l<61 a k+mlx l F 2 [X ] <60 (A F 2 [X ] <60m ) Ā = k<m ωk à k (θ)z k F 2 60[Z] <m van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

23 Presentation of the algorithm Frobenius encoding Frobenius encoding Cooley-Tukey FFT A(ω 61i+1 ) = k<m ω k l<61 a k+ml ω ml ω 61ki θ := ω m, ω := ω 61 à k := l<61 a k+mlx l F 2 [X ] <60 (A F 2 [X ] <60m ) Ā = k<m ωk à k (θ)z k F 2 60[Z] <m Technical assumption Assume ω chosen such that θ = z mod µ(z) with µ(z) := z61 1 z 1 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

24 Implementation details Outline Introduction Presentation of the algorithm Implementation details Data Representation Frobenius encoding Timings Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

25 Implementation details Data Representation Data Representation Polynomials over F 2 in packed representation. Elements of F 2 60 on one machine word; polynomials over F 2 60 as an array of words. Matrices over F 2 in packed column representation. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

26 Implementation details Data Representation Data Representation Polynomials over F 2 in packed representation. Elements of F 2 60 on one machine word; polynomials over F 2 60 as an array of words. Matrices over F 2 in packed column representation. A F 2 [X ] <60m A as a 60 m matrix van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

27 Implementation details Data Representation Data Representation Polynomials over F 2 in packed representation. Elements of F 2 60 on one machine word; polynomials over F 2 60 as an array of words. Matrices over F 2 in packed column representation. A F 2 [X ] <60m à k (X ) A as a 60 m matrix van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

28 Implementation details Frobenius encoding Frobenius encoding See A as a 60 m matrix; add 4 columns for alignment. Transpose the 64 m matrix ( [Ãk(θ)] k<m ). Multiply by the twiddle factors ω k ( Ā). van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

29 Implementation details Frobenius encoding Frobenius encoding See A as a 60 m matrix; add 4 columns for alignment. Transpose the 64 m matrix ( [Ãk(θ)] k<m ). Multiply by the twiddle factors ω k ( Ā). Matrix transposition Exploit the AVX2 instruction set Reduction (64 m) (64 256) (8 8) Transpose 4 packed 8 8 matrices at once using vector instruction. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

30 Implementation details Frobenius encoding Frobenius encoding See A as a 60 m matrix; add 4 columns for alignment. Transpose the 64 m matrix ( [Ãk(θ)] k<m ). Multiply by the twiddle factors ω k ( Ā). Matrix transposition Exploit the AVX2 instruction set Reduction (64 m) (64 256) (8 8) Transpose 4 packed 8 8 matrices at once using vector instruction. Finally, call the efficient FFT over F 2 60 on Ā. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

31 Implementation details Timings Timings Timings (ms) 8000 Old implementation Chen et. al gf2x Version 1.2 New implementation Input size (quadwords) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

32 Implementation details Timings Timings Timings (ms) 8000 Old implementation New implementation Input size (quadwords) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

33 Perspectives Outline Introduction Presentation of the algorithm Implementation details Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

34 Perspectives Perspectives Better use of vector instructions Vectorize the FFT routine over F Support for the new AVX-512. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

35 Perspectives Perspectives Better use of vector instructions Vectorize the FFT routine over F Support for the new AVX-512. Others Use the Truncated Fourier Transform (reduce the staircase effect) Generalization for other finite fields van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

36 Questions? Thank you for your attention van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17

Implementing fast carryless multiplication

Implementing fast carryless multiplication Implementing fast carryless multiplication Joris Van Der Hoeven, Robin Larrieu, Grégoire Lecerf To cite this version: Joris Van Der Hoeven, Robin Larrieu, Grégoire Lecerf. Implementing fast carryless multiplication.

More information

Even faster integer multiplication

Even faster integer multiplication Even faster integer multiplication DAVID HARVEY School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au JORIS VAN DER HOEVEN a, GRÉGOIRE

More information

Faster integer multiplication using short lattice vectors

Faster integer multiplication using short lattice vectors Faster integer multiplication using short lattice vectors David Harvey and Joris van der Hoeven ANTS XIII, University of Wisconsin, Madison, July 2018 University of New South Wales / CNRS, École Polytechnique

More information

Integer multiplication with generalized Fermat primes

Integer multiplication with generalized Fermat primes Integer multiplication with generalized Fermat primes CARAMEL Team, LORIA, University of Lorraine Supervised by: Emmanuel Thomé and Jérémie Detrey Journées nationales du Calcul Formel 2015 (Cluny) November

More information

Even faster integer multiplication

Even faster integer multiplication Even faster integer multiplication DAVID HARVEY School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au JORIS VAN DER HOEVEN a, GRÉGOIRE

More information

Integer multiplication and the truncated product problem

Integer multiplication and the truncated product problem Integer multiplication and the truncated product problem David Harvey Arithmetic Geometry, Number Theory, and Computation MIT, August 2018 University of New South Wales Political update from Australia

More information

Fast integer multiplication

Fast integer multiplication Fast integer multiplication David Harvey, Joris van der Hoeven, Grégoire Lecerf CNRS, École polytechnique Bordeaux, February 2, 2015 http://www.texmacs.org Fundamental complexities I(n): multiplication

More information

Fast algorithms for polynomials and matrices Part 2: polynomial multiplication

Fast algorithms for polynomials and matrices Part 2: polynomial multiplication Fast algorithms for polynomials and matrices Part 2: polynomial multiplication by Grégoire Lecerf Computer Science Laboratory & CNRS École polytechnique 91128 Palaiseau Cedex France 1 Notation In this

More information

Implementation of the DKSS Algorithm for Multiplication of Large Numbers

Implementation of the DKSS Algorithm for Multiplication of Large Numbers Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn The International Symposium on Symbolic and Algebraic Computation, July 6 9, 2015, Bath, United

More information

Faster polynomial multiplication over nite elds

Faster polynomial multiplication over nite elds Faster polynomial multiplication over nite elds David Harvey School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au Joris van der Hoeven

More information

Three Ways to Test Irreducibility

Three Ways to Test Irreducibility Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 12 Feb 2009 Outline Polynomials over finite fields Irreducibility criteria

More information

Even faster integer multiplication

Even faster integer multiplication Even faster integer multiplication David Harvey School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au Joris van der Hoeven a, Grégoire

More information

Three Ways to Test Irreducibility

Three Ways to Test Irreducibility Outline Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 8 Dec 2008 Polynomials over finite fields Irreducibility criteria

More information

Implementation of the DKSS Algorithm for Multiplication of Large Numbers

Implementation of the DKSS Algorithm for Multiplication of Large Numbers Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn Institut für Informatik Bonn, Germany chris@cfos.de ABSTRACT The Schönhage-Strassen algorithm

More information

An Illustrated Introduction to the Truncated Fourier Transform

An Illustrated Introduction to the Truncated Fourier Transform An Illustrated Introduction to the Truncated Fourier Transform arxiv:1602.04562v2 [cs.sc] 17 Feb 2016 Paul Vrbik. School of Mathematical and Physical Sciences The University of Newcastle Callaghan, Australia

More information

arxiv: v2 [cs.sc] 14 Sep 2017

arxiv: v2 [cs.sc] 14 Sep 2017 Faster Multiplication for Long Binary Polynomials Ming-Shing Chen 1,3, Chen-Mou Cheng 1, Po-Chun Kuo 1,2, Wen-Ding Li 2, and Bo-Yin Yang 2,3 arxiv:1708.09746v2 [cs.sc] 14 Sep 2017 1 Department of Electrical

More information

Fast Polynomial Multiplication over F 2

Fast Polynomial Multiplication over F 2 Fast Polynomial Multiplication over F 2 60 David Harvey School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia d.harvey@unsw.edu.au Joris van der Hoeven, Grégoire

More information

Divide and Conquer algorithms

Divide and Conquer algorithms Divide and Conquer algorithms Another general method for constructing algorithms is given by the Divide and Conquer strategy. We assume that we have a problem with input that can be split into parts in

More information

Fast Convolution; Strassen s Method

Fast Convolution; Strassen s Method Fast Convolution; Strassen s Method 1 Fast Convolution reduction to subquadratic time polynomial evaluation at complex roots of unity interpolation via evaluation at complex roots of unity 2 The Master

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms

More information

Integer multiplication in time O(n log n)

Integer multiplication in time O(n log n) Integer multiplication in time O(n log n) David Harvey, Joris Van Der Hoeven To cite this version: David Harvey, Joris Van Der Hoeven. Integer multiplication in time O(n log n). 2019. HAL

More information

Multiplying huge integers using Fourier transforms

Multiplying huge integers using Fourier transforms Fourier transforms October 25, 2007 820348901038490238478324 1739423249728934932894??? integers occurs in many fields of Computational Science: Cryptography Number theory... Traditional approaches to

More information

Output-sensitive algorithms for sumset and sparse polynomial multiplication

Output-sensitive algorithms for sumset and sparse polynomial multiplication Output-sensitive algorithms for sumset and sparse polynomial multiplication Andrew Arnold Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada Daniel S. Roche Computer Science

More information

Fast Polynomials Multiplication Using FFT

Fast Polynomials Multiplication Using FFT Li Chen lichen.xd at gmail.com Xidian University January 17, 2014 Outline 1 Discrete Fourier Transform (DFT) 2 Discrete Convolution 3 Fast Fourier Transform (FFT) 4 umber Theoretic Transform (TT) 5 More

More information

Fast Multivariate Power Series Multiplication in Characteristic Zero

Fast Multivariate Power Series Multiplication in Characteristic Zero Fast Multivariate Power Series Multiplication in Characteristic Zero Grégoire Lecerf and Éric Schost Laboratoire GAGE, École polytechnique 91128 Palaiseau, France E-mail: lecerf,schost@gage.polytechnique.fr

More information

arxiv: v1 [cs.sc] 22 Nov 2016

arxiv: v1 [cs.sc] 22 Nov 2016 FASTER INTEGER MULTIPLICATION USING PLAIN VANILLA FFT PRIMES arxiv:1611.07144v1 [cs.sc] 22 Nov 2016 DAVID HARVEY AND JORIS VAN DER HOEVEN Abstract. Assuming a conjectural upper bound for the least prime

More information

Fast integer multiplication using generalized Fermat primes

Fast integer multiplication using generalized Fermat primes Fast integer multiplication using generalized Fermat primes Svyatoslav Covanov, Emmanuel Thomé To cite this version: Svyatoslav Covanov, Emmanuel Thomé. Fast integer multiplication using generalized Fermat

More information

Chapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn

Chapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn Chapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn Formulation of the D&C principle Divide-and-conquer method for solving a problem instance of size n: 1. Divide

More information

PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) Linxiao Wang. Graduate Program in Computer Science

PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) Linxiao Wang. Graduate Program in Computer Science PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) by Linxiao Wang Graduate Program in Computer Science A thesis submitted in partial fulfillment of the requirements

More information

3 Finite fields and integer arithmetic

3 Finite fields and integer arithmetic 18.783 Elliptic Curves Lecture #3 Spring 2017 02/15/2017 3 Finite fields and integer arithmetic In order to perform explicit computations with elliptic curves over finite fields, we first need to understand

More information

2 The Truncated Fourier Transform and Applications The TFT permits to speed up the multiplication of univariate polynomials with a constant factor bet

2 The Truncated Fourier Transform and Applications The TFT permits to speed up the multiplication of univariate polynomials with a constant factor bet The Truncated Fourier Transform and Applications Joris van der Hoeven D pt. de Math matiques (B t. 425) Universit Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org January 9, 2004 In this paper,

More information

arxiv: v1 [cs.ds] 28 Jan 2010

arxiv: v1 [cs.ds] 28 Jan 2010 An in-place truncated Fourier transform and applications to polynomial multiplication arxiv:1001.5272v1 [cs.ds] 28 Jan 2010 ABSTRACT David Harvey Courant Institute of Mathematical Sciences New York University

More information

5.6 Convolution and FFT

5.6 Convolution and FFT 5.6 Convolution and FFT Fast Fourier Transform: Applications Applications. Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression,

More information

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. Prof. R. Fateman

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. Prof. R. Fateman UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 282 Spring, 2000 Prof. R. Fateman The (finite field) Fast Fourier Transform 0. Introduction

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 14 Divide and Conquer Fast Fourier Transform Sofya Raskhodnikova 10/7/2016 S. Raskhodnikova; based on slides by K. Wayne. 5.6 Convolution and FFT Fast Fourier Transform:

More information

Fast Computation of Power Series Solutions of Systems of Differential Equations

Fast Computation of Power Series Solutions of Systems of Differential Equations Fast Computation of Power Series Solutions of Systems of Differential Equations Bruno.Salvy@inria.fr Algorithms Project, Inria Joint work with A. Bostan, F. Chyzak, F. Ollivier, É. Schost, A. Sedoglavic

More information

Software implementation of Koblitz curves over quadratic fields

Software implementation of Koblitz curves over quadratic fields Software implementation of Koblitz curves over quadratic fields Thomaz Oliveira 1, Julio López 2 and Francisco Rodríguez-Henríquez 1 1 Computer Science Department, Cinvestav-IPN 2 Institute of Computing,

More information

feb abhi shelat Matrix, FFT

feb abhi shelat Matrix, FFT L7 feb 11 2016 abhi shelat Matrix, FFT userid: = Using the standard method, how many multiplications does it take to multiply two NxN matrices? cos( /4) = cos( /2) = sin( /4) = sin( /2) = Mergesort Karatsuba

More information

Faster Multiplication in GF(2)[x]

Faster Multiplication in GF(2)[x] Faster Multiplication in GF(2)[x] Richard P. Brent 1, Pierrick Gaudry 2, Emmanuel Thomé 3, and Paul Zimmermann 3 1 Australian National University, Canberra, Australia 2 LORIA/CNRS, Vandœuvre-lès-Nancy,

More information

The Fast Fourier Transform: A Brief Overview. with Applications. Petros Kondylis. Petros Kondylis. December 4, 2014

The Fast Fourier Transform: A Brief Overview. with Applications. Petros Kondylis. Petros Kondylis. December 4, 2014 December 4, 2014 Timeline Researcher Date Length of Sequence Application CF Gauss 1805 Any Composite Integer Interpolation of orbits of celestial bodies F Carlini 1828 12 Harmonic Analysis of Barometric

More information

DIVIDE AND CONQUER II

DIVIDE AND CONQUER II DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos

More information

CSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences

CSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer What you really need to know about recurrences Work per level changes geometrically with the level Geometrically increasing (x > 1) The

More information

Parallel Integer Polynomial Multiplication Changbo Chen, Svyatoslav Parallel Integer Covanov, Polynomial FarnamMultiplication

Parallel Integer Polynomial Multiplication Changbo Chen, Svyatoslav Parallel Integer Covanov, Polynomial FarnamMultiplication Parallel Integer Polynomial Multiplication Parallel Integer Polynomial Multiplication Changbo Chen 1 Svyatoslav Covanov 2,3 Farnam Mansouri 2 Marc Moreno Maza 2 Ning Xie 2 Yuzhen Xie 2 1 Chinese Academy

More information

Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients

Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients , July 4-6, 01, London, UK Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients Andrzej Chmielowiec Abstract This paper aims to develop and analyze an effective parallel algorithm

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Design and Analysis of Algorithms CSE 5311 Lecture 5 Divide and Conquer: Fast Fourier Transform Junzhou Huang, Ph.D. Department of Computer Science and Engineering CSE5311 Design and Analysis of Algorithms

More information

Balanced Dense Polynomial Multiplication on Multicores

Balanced Dense Polynomial Multiplication on Multicores Balanced Dense Polynomial Multiplication on Multicores Yuzhen Xie SuperTech Group, CSAIL MIT joint work with Marc Moreno Maza ORCCA, UWO ACA09, Montreal, June 26, 2009 Introduction Motivation: Multicore-enabling

More information

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Selçuk Baktır, Berk Sunar {selcuk,sunar}@wpi.edu Department of Electrical & Computer Engineering Worcester Polytechnic Institute

More information

CSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )

CSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication ) CSE 548: Analysis of Algorithms Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2015 Coefficient Representation

More information

Parallelism in Computer Arithmetic: A Historical Perspective

Parallelism in Computer Arithmetic: A Historical Perspective Parallelism in Computer Arithmetic: A Historical Perspective 21s 2s 199s 198s 197s 196s 195s Behrooz Parhami Aug. 218 Parallelism in Computer Arithmetic Slide 1 University of California, Santa Barbara

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis CS711008Z Algorithm Design and Analysis Lecture 5 FFT and Divide and Conquer Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 56 Outline DFT: evaluate a polynomial

More information

SCALED REMAINDER TREES

SCALED REMAINDER TREES Draft. Aimed at Math. Comp. SCALED REMAINDER TREES DANIEL J. BERNSTEIN Abstract. It is well known that one can compute U mod p 1, U mod p 2,... in time n(lg n) 2+o(1) where n is the number of bits in U,

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 12 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial,

More information

Large Integer Multiplication on Hypercubes. Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH

Large Integer Multiplication on Hypercubes. Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH Large Integer Multiplication on Hypercubes Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH 03755 barry.fagin@dartmouth.edu Large Integer Multiplication 1 B. Fagin ABSTRACT Previous

More information

Fast Fourier Transform

Fast Fourier Transform Why Fourier Transform? Fast Fourier Transform Jordi Cortadella and Jordi Petit Department of Computer Science Polynomials: coefficient representation Divide & Conquer Dept. CS, UPC Polynomials: point-value

More information

Fast reversion of formal power series

Fast reversion of formal power series Fast reversion of formal power series Fredrik Johansson LFANT, INRIA Bordeaux RAIM, 2016-06-29, Banyuls-sur-mer 1 / 30 Reversion of power series F = exp(x) 1 = x + x 2 2! + x 3 3! + x 4 G = log(1 + x)

More information

Fast reversion of power series

Fast reversion of power series Fast reversion of power series Fredrik Johansson November 2011 Overview Fast power series arithmetic Fast composition and reversion (Brent and Kung, 1978) A new algorithm for reversion Implementation results

More information

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi. How to ultiply Slides by Kevin Wayne. Copyright 5 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials Complex ultiplication Complex multiplication. a + bi) c + di) = x + yi.

More information

The tangent FFT. D. J. Bernstein University of Illinois at Chicago

The tangent FFT. D. J. Bernstein University of Illinois at Chicago The tangent FFT D. J. Bernstein University of Illinois at Chicago Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography

More information

Computational Methods for Astrophysics: Fourier Transforms

Computational Methods for Astrophysics: Fourier Transforms Computational Methods for Astrophysics: Fourier Transforms John T. Whelan (filling in for Joshua Faber) April 27, 2011 John T. Whelan April 27, 2011 Fourier Transforms 1/13 Fourier Analysis Outline: Fourier

More information

Big Prime Field FFT on the GPU

Big Prime Field FFT on the GPU Big Prime Field FFT on the GPU Liangyu Chen, Svyatoslav Covanov, Davood Mohajerani, Marc Moreno Maza To cite this version: Liangyu Chen, Svyatoslav Covanov, Davood Mohajerani, Marc Moreno Maza. Big Prime

More information

Fast Matrix Product Algorithms: From Theory To Practice

Fast Matrix Product Algorithms: From Theory To Practice Introduction and Definitions The τ-theorem Pan s aggregation tables and the τ-theorem Software Implementation Conclusion Fast Matrix Product Algorithms: From Theory To Practice Thomas Sibut-Pinote Inria,

More information

A parallel implementation for polynomial multiplication modulo a prime.

A parallel implementation for polynomial multiplication modulo a prime. A parallel implementation for polynomial multiplication modulo a prime. ABSTRACT Marshall Law Department of Mathematics Simon Fraser University Burnaby, B.C. Canada. mylaw@sfu.ca. We present a parallel

More information

Shor s Algorithm. Polynomial-time Prime Factorization with Quantum Computing. Sourabh Kulkarni October 13th, 2017

Shor s Algorithm. Polynomial-time Prime Factorization with Quantum Computing. Sourabh Kulkarni October 13th, 2017 Shor s Algorithm Polynomial-time Prime Factorization with Quantum Computing Sourabh Kulkarni October 13th, 2017 Content Church Thesis Prime Numbers and Cryptography Overview of Shor s Algorithm Implementation

More information

Fast composition of numeric power series

Fast composition of numeric power series Fast composition of numeric power series Joris van der Hoeven CNRS, Département de Mathématiques Bâtiment 425 Université Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org Web: http://www.math.u-psud.fr/~vdhoeven

More information

Fast multiplication and its applications

Fast multiplication and its applications Algorithmic Number Theory MSRI Publications Volume 44, 2008 Fast multiplication and its applications DANIEL J. BERNSTEIN ABSTRACT. This survey explains how some useful arithmetic operations can be sped

More information

On the computational complexity of mathematical functions

On the computational complexity of mathematical functions On the computational complexity of mathematical functions Jean-Pierre Demailly Institut Fourier, Université de Grenoble I & Académie des Sciences, Paris (France) November 26, 2011 KVPY conference at Vijyoshi

More information

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR 0716498 Part I. Linear maps Consider computing 0

More information

Newton s method and FFT trading

Newton s method and FFT trading Newton s method and FFT trading Joris van der Hoeven Dépt. de Mathématiques (Bât. 425) CNRS, Université Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org December 9, 2008 Let C[[z]] be the ring

More information

4.3 The Discrete Fourier Transform (DFT) and the Fast Fourier Transform (FFT)

4.3 The Discrete Fourier Transform (DFT) and the Fast Fourier Transform (FFT) CHAPTER. TIME-FREQUECY AALYSIS: FOURIER TRASFORMS AD WAVELETS.3 The Discrete Fourier Transform (DFT and the Fast Fourier Transform (FFT.3.1 Introduction In this section, we discuss some of the mathematics

More information

Toward High Performance Matrix Multiplication for Exact Computation

Toward High Performance Matrix Multiplication for Exact Computation Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations

More information

Schönhage-Strassen Algorithm with MapReduce for Multiplying Terabit Integers (April 29, 2011)

Schönhage-Strassen Algorithm with MapReduce for Multiplying Terabit Integers (April 29, 2011) Schönhage-Strassen Algorithm with MapReduce for Multiplying Terabit Integers (April 29, 2011) Tsz-Wo Sze Yahoo! Cloud Platform 701 First Avenue Sunnyvale, CA 94089, USA tsz@yahoo-inc.com ABSTRACT We present

More information

FFT: Fast Polynomial Multiplications

FFT: Fast Polynomial Multiplications FFT: Fast Polynomial Multiplications Jie Wang University of Massachusetts Lowell Department of Computer Science J. Wang (UMass Lowell) FFT: Fast Polynomial Multiplications 1 / 20 Overview So far we have

More information

Computing Characteristic Polynomials of Matrices of Structured Polynomials

Computing Characteristic Polynomials of Matrices of Structured Polynomials Computing Characteristic Polynomials of Matrices of Structured Polynomials Marshall Law and Michael Monagan Department of Mathematics Simon Fraser University Burnaby, British Columbia, Canada mylaw@sfu.ca

More information

Elliptic Curves Spring 2013 Lecture #3 02/12/2013

Elliptic Curves Spring 2013 Lecture #3 02/12/2013 18.783 Elliptic Curves Spring 2013 Lecture #3 02/12/2013 3.1 Arithmetic in finite fields To make explicit computations with elliptic curves over finite fields, we need to know how to perform arithmetic

More information

Algorithms and data structures

Algorithms and data structures Algorithms and data structures Amin Coja-Oghlan LFCS Complex numbers Roots of polynomials A polynomial of degree d is a function of the form p(x) = d a i x i with a d 0. i=0 There are at most d numbers

More information

Lecture 20: Discrete Fourier Transform and FFT

Lecture 20: Discrete Fourier Transform and FFT EE518 Digital Signal Processing University of Washington Autumn 2001 Dept of Electrical Engineering Lecture 20: Discrete Fourier Transform and FFT Dec 10, 2001 Prof: J Bilmes TA:

More information

Smoothness Testing of Polynomials over Finite Fields

Smoothness Testing of Polynomials over Finite Fields Smoothness Testing of Polynomials over Finite Fields Jean-François Biasse and Michael J. Jacobson Jr. Department of Computer Science, University of Calgary 2500 University Drive NW Calgary, Alberta, Canada

More information

Appendix C: Recapitulation of Numerical schemes

Appendix C: Recapitulation of Numerical schemes Appendix C: Recapitulation of Numerical schemes August 31, 2009) SUMMARY: Certain numerical schemes of general use are regrouped here in order to facilitate implementations of simple models C1 The tridiagonal

More information

Literature Review: Adaptive Polynomial Multiplication

Literature Review: Adaptive Polynomial Multiplication Literature Review: Adaptive Polynomial Multiplication Daniel S. Roche November 27, 2007 While output-sensitive algorithms have gained a fair amount of popularity in the computer algebra community, adaptive

More information

Mid-term Exam Answers and Final Exam Study Guide CIS 675 Summer 2010

Mid-term Exam Answers and Final Exam Study Guide CIS 675 Summer 2010 Mid-term Exam Answers and Final Exam Study Guide CIS 675 Summer 2010 Midterm Problem 1: Recall that for two functions g : N N + and h : N N +, h = Θ(g) iff for some positive integer N and positive real

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2017) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

Faster polynomial multiplication over finite fields

Faster polynomial multiplication over finite fields Faster polynomial multiplication over finite fields David Harvey, Joris Van Der Hoeven, Grégoire Lecerf To cite this version: David Harvey, Joris Van Der Hoeven, Grégoire Lecerf. Faster polynomial multiplication

More information

RSA Implementation. Oregon State University

RSA Implementation. Oregon State University RSA Implementation Çetin Kaya Koç Oregon State University 1 Contents: Exponentiation heuristics Multiplication algorithms Computation of GCD and Inverse Chinese remainder algorithm Primality testing 2

More information

Algorithms for exact (dense) linear algebra

Algorithms for exact (dense) linear algebra Algorithms for exact (dense) linear algebra Gilles Villard CNRS, Laboratoire LIP ENS Lyon Montagnac-Montpezat, June 3, 2005 Introduction Problem: Study of complexity estimates for basic problems in exact

More information

Exact Arithmetic on a Computer

Exact Arithmetic on a Computer Exact Arithmetic on a Computer Symbolic Computation and Computer Algebra William J. Turner Department of Mathematics & Computer Science Wabash College Crawfordsville, IN 47933 Tuesday 21 September 2010

More information

Fast Polynomial Multiplication

Fast Polynomial Multiplication Fast Polynomial Multiplication Marc Moreno Maza CS 9652, October 4, 2017 Plan Primitive roots of unity The discrete Fourier transform Convolution of polynomials The fast Fourier transform Fast convolution

More information

McBits: Fast code-based cryptography

McBits: Fast code-based cryptography McBits: Fast code-based cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands Joint work with Daniel Bernstein, Tung Chou December 17, 2013 IMA International Conference on Cryptography

More information

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform

More information

The Fast Fourier Transform. Andreas Klappenecker

The Fast Fourier Transform. Andreas Klappenecker The Fast Fourier Transform Andreas Klappenecker Motivation There are few algorithms that had more impact on modern society than the fast Fourier transform and its relatives. The applications of the fast

More information

arxiv: v1 [cs.na] 8 Feb 2016

arxiv: v1 [cs.na] 8 Feb 2016 Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication

More information

Balanced dense polynomial multiplication on multi-cores

Balanced dense polynomial multiplication on multi-cores Balanced dense polynomial multiplication on multi-cores The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher

More information

Some long-period random number generators using shifts and xors

Some long-period random number generators using shifts and xors ANZIAM J. 48 (CTAC2006) pp.c188 C202, 2007 C188 Some long-period random number generators using shifts and xors Richard P. Brent 1 (Received 6 July 2006; revised 2 July 2007) Abstract Marsaglia recently

More information

Review: Linear and Vector Algebra

Review: Linear and Vector Algebra Review: Linear and Vector Algebra Points in Euclidean Space Location in space Tuple of n coordinates x, y, z, etc Cannot be added or multiplied together Vectors: Arrows in Space Vectors are point changes

More information

CS 4424 Matrix multiplication

CS 4424 Matrix multiplication CS 4424 Matrix multiplication 1 Reminder: matrix multiplication Matrix-matrix product. Starting from a 1,1 a 1,n A =.. and B = a n,1 a n,n b 1,1 b 1,n.., b n,1 b n,n we get AB by multiplying A by all columns

More information

A heuristic quasi-polynomial algorithm for discrete logarithm in small characteristic

A heuristic quasi-polynomial algorithm for discrete logarithm in small characteristic ECC, Chennai October 8, 2014 A heuristic quasi-polynomial algorithm for discrete logarithm in small characteristic Razvan Barbulescu 1 Pierrick Gaudry 2 Antoine Joux 3 Emmanuel Thomé 2 IMJ-PRG, Paris Loria,

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,

More information

Sparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ.

Sparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ. Sparsity Matters Robert J. Vanderbei 2017 September 20 http://www.princeton.edu/ rvdb IDA: Center for Communications Research Princeton NJ The simplex method is 200 times faster... The simplex method is

More information

MA3232 Numerical Analysis Week 9. James Cooley (1926-)

MA3232 Numerical Analysis Week 9. James Cooley (1926-) MA umerical Analysis Week 9 James Cooley (96-) James Cooley is an American mathematician. His most significant contribution to the world of mathematics and digital signal processing is the Fast Fourier

More information

Lucas Lehmer primality test - Wikipedia, the free encyclopedia

Lucas Lehmer primality test - Wikipedia, the free encyclopedia Lucas Lehmer primality test From Wikipedia, the free encyclopedia In mathematics, the Lucas Lehmer test (LLT) is a primality test for Mersenne numbers. The test was originally developed by Edouard Lucas

More information

feb abhi shelat FFT,Median

feb abhi shelat FFT,Median L8 feb 16 2016 abhi shelat FFT,Median merge-sort (A, p, r) if pn B[k] A[i];

More information