Implementing Fast Carryless Multiplication
|
|
- Gabriel Melton
- 5 years ago
- Views:
Transcription
1 Implementing Fast Carryless Multiplication Joris van der Hoeven, Robin Larrieu and Grégoire Lecerf CNRS & École polytechnique MACIS 2017 Nov. 15, Vienna, Austria van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
2 Introduction Outline Introduction Carryless multiplication State of the art Presentation of the algorithm Implementation details Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
3 Introduction Carryless multiplication Carryless multiplication Multiplication in F 2 [X ], large degree (typically 10 6 ). Fast algorithms for such sizes use FFT multiplication. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
4 Introduction Carryless multiplication Carryless multiplication Multiplication in F 2 [X ], large degree (typically 10 6 ). Fast algorithms for such sizes use FFT multiplication. Problem Not many evaluation points in F 2 work in an extension field. How to minimize the corresponding overhead? van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
5 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
6 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) 2. FFT over F 2 60 (Harvey, van der Hoeven, Lecerf 2016) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
7 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) 2. FFT over F 2 60 (Harvey, van der Hoeven, Lecerf 2016) 3. Additive FFT over F or F (Chen, Cheng, Kuo, Li, Yang 2017) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
8 Introduction State of the art State of the art 1. Triadic Schönhage-Strassen algorithm (gf2x Brent, Gaudry, Thomé, Zimmermann) 2. FFT over F 2 60 (Harvey, van der Hoeven, Lecerf 2016) 3. Additive FFT over F or F (Chen, Cheng, Kuo, Li, Yang 2017) This work 1 Improvement of strategy n. o 2 using the ideas from the Frobenius FFT algorithm (van der Hoeven, Larrieu 2017). Achieves a speedup by a factor 2. 1 Source code available from revision of our SVN server ( in the justinline library van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
9 Introduction State of the art Why F 2 60? Efficient arithmetic in F 2 60 Slightly smaller than a machine word µ(x ) := X 61 1 X 1 irreducible over F 2 Efficient FFT Roots of unity with order = van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
10 Introduction State of the art Why F 2 60? Efficient arithmetic in F 2 60 Slightly smaller than a machine word µ(x ) := X 61 1 X 1 irreducible over F 2 Efficient FFT Roots of unity with order = Bonus 61 divides (Fermat s theorem) 2 generates (Z/61Z) ( µ(x ) irreducible) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
11 Presentation of the algorithm Outline Introduction Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Our variant of the Frobenius FFT Frobenius encoding Implementation details Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
12 Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Kronecker segmentation vs. Frobenius FFT Naive strategy F 2 [X ] <n F 2 60[X ] <n A F 2 [X ] B F 2 [X ] Ã F 2 60[X ] B F 2 60[X ] AB F 2 60[X ] AB F 2 [X ] van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
13 Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Kronecker segmentation vs. Frobenius FFT Naive strategy F 2 [X ] <n F 2 60[X ] <n A F 2 [X ] B F 2 [X ] Ã F 2 60[X ] B F 2 60[X ] AB F 2 60[X ] AB F 2 [X ] Kronecker segmentation F 2 [X ] <n F 2 [X ] <30 [Z] <n/30 F 2 60[Z] <n/30 A F 2 [X ] B F 2 [X ] Ã F 2 [X ] <30 [Z] B F 2 [X ] <30 [Z] AB F 2 [X ] <60 [Z] Z = X 30 AB F 2 [X ] van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
14 Presentation of the algorithm Kronecker segmentation vs. Frobenius FFT Kronecker segmentation vs. Frobenius FFT Naive strategy F 2 [X ] <n F 2 60[X ] <n A F 2 [X ] B F 2 [X ] Ã F 2 60[X ] B F 2 60[X ] AB F 2 60[X ] AB F 2 [X ] Kronecker segmentation F 2 [X ] <n F 2 [X ] <30 [Z] <n/30 F 2 60[Z] <n/30 A F 2 [X ] B F 2 [X ] Ã F 2 [X ] <30 [Z] B F 2 [X ] <30 [Z] AB F 2 [X ] <60 [Z] Z = X 30 AB F 2 [X ] Frobenius FFT For ω a root of unity, φ : x x 2 acts on {1, ω, ω 2, ω 3,... }. The naive DFT A [A(1), A(ω), A(ω 2 ),... ] causes redundant computation: A F 2 [X ], x F 2 60 A(x 2 ) = A(x) 2 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
15 Presentation of the algorithm Our variant of the Frobenius FFT Our variant of the Frobenius FFT A F 2 [X ] <60m DFT ω [A(ω 61i )] [A(ω 61i+1 )] [A(ω 61i+2 )] [A(ω 61i+3 )] [A(ω 61i+4 )] 61 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
16 Presentation of the algorithm Our variant of the Frobenius FFT Our variant of the Frobenius FFT A F 2 [X ] <60m DFT ω φ φ φ [A(ω 61i )] [A(ω 61i+1 )] [A(ω 61i+2 )] [A(ω 61i+3 )] [A(ω 61i+4 )] 61 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
17 Presentation of the algorithm Our variant of the Frobenius FFT Our variant of the Frobenius FFT A F 2 [X ] <60m Ā F 2 60[X ] <m Frobenius encoding DFT ω DFT ω φ φ φ [A(ω 61i )] [A(ω 61i+1 )] [A(ω 61i+2 )] [A(ω 61i+3 )] [A(ω 61i+4 )] 61 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
18 Presentation of the algorithm Our variant of the Frobenius FFT Multiplication algorithm A F 2 [X ] <a Frobenius Encoding Ā F 2 60[X ] <a/60 B F 2 60[X ] <b/60 DFT ω E ω (A) F m 2 60 a + b < 60m 61m divides AB F 2 [X ] <a+b van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
19 Presentation of the algorithm Our variant of the Frobenius FFT Multiplication algorithm A F 2 [X ] <a Frobenius Encoding Ā F 2 60[X ] <a/60 DFT ω E ω (A) F m 2 60 B F 2 [X ] <b Frobenius Encoding B F 2 60[X ] <b/60 DFT ω E ω (B) F m 2 60 a + b < 60m 61m divides AB F 2 [X ] <a+b van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
20 Presentation of the algorithm Our variant of the Frobenius FFT Multiplication algorithm A F 2 [X ] <a Frobenius Encoding Ā F 2 60[X ] <a/60 DFT ω B F 2 [X ] <b Frobenius Encoding B F 2 60[X ] <b/60 DFT ω E ω (A) F m 2 60 a + b < 60m 61m divides pointwise product E ω (AB) F m 2 60 DFT 1 ω AB F 2 60[X ] <m Frobenius Decoding AB F 2 [X ] <a+b E ω (B) F m 2 60 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
21 Presentation of the algorithm Frobenius encoding Frobenius encoding Cooley-Tukey FFT A(ω 61i+1 ) = k<m ω k l<61 a k+ml ω ml ω 61ki van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
22 Presentation of the algorithm Frobenius encoding Frobenius encoding Cooley-Tukey FFT A(ω 61i+1 ) = k<m ω k l<61 a k+ml ω ml ω 61ki θ := ω m, ω := ω 61 à k := l<61 a k+mlx l F 2 [X ] <60 (A F 2 [X ] <60m ) Ā = k<m ωk à k (θ)z k F 2 60[Z] <m van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
23 Presentation of the algorithm Frobenius encoding Frobenius encoding Cooley-Tukey FFT A(ω 61i+1 ) = k<m ω k l<61 a k+ml ω ml ω 61ki θ := ω m, ω := ω 61 à k := l<61 a k+mlx l F 2 [X ] <60 (A F 2 [X ] <60m ) Ā = k<m ωk à k (θ)z k F 2 60[Z] <m Technical assumption Assume ω chosen such that θ = z mod µ(z) with µ(z) := z61 1 z 1 van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
24 Implementation details Outline Introduction Presentation of the algorithm Implementation details Data Representation Frobenius encoding Timings Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
25 Implementation details Data Representation Data Representation Polynomials over F 2 in packed representation. Elements of F 2 60 on one machine word; polynomials over F 2 60 as an array of words. Matrices over F 2 in packed column representation. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
26 Implementation details Data Representation Data Representation Polynomials over F 2 in packed representation. Elements of F 2 60 on one machine word; polynomials over F 2 60 as an array of words. Matrices over F 2 in packed column representation. A F 2 [X ] <60m A as a 60 m matrix van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
27 Implementation details Data Representation Data Representation Polynomials over F 2 in packed representation. Elements of F 2 60 on one machine word; polynomials over F 2 60 as an array of words. Matrices over F 2 in packed column representation. A F 2 [X ] <60m à k (X ) A as a 60 m matrix van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
28 Implementation details Frobenius encoding Frobenius encoding See A as a 60 m matrix; add 4 columns for alignment. Transpose the 64 m matrix ( [Ãk(θ)] k<m ). Multiply by the twiddle factors ω k ( Ā). van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
29 Implementation details Frobenius encoding Frobenius encoding See A as a 60 m matrix; add 4 columns for alignment. Transpose the 64 m matrix ( [Ãk(θ)] k<m ). Multiply by the twiddle factors ω k ( Ā). Matrix transposition Exploit the AVX2 instruction set Reduction (64 m) (64 256) (8 8) Transpose 4 packed 8 8 matrices at once using vector instruction. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
30 Implementation details Frobenius encoding Frobenius encoding See A as a 60 m matrix; add 4 columns for alignment. Transpose the 64 m matrix ( [Ãk(θ)] k<m ). Multiply by the twiddle factors ω k ( Ā). Matrix transposition Exploit the AVX2 instruction set Reduction (64 m) (64 256) (8 8) Transpose 4 packed 8 8 matrices at once using vector instruction. Finally, call the efficient FFT over F 2 60 on Ā. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
31 Implementation details Timings Timings Timings (ms) 8000 Old implementation Chen et. al gf2x Version 1.2 New implementation Input size (quadwords) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
32 Implementation details Timings Timings Timings (ms) 8000 Old implementation New implementation Input size (quadwords) van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
33 Perspectives Outline Introduction Presentation of the algorithm Implementation details Perspectives van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
34 Perspectives Perspectives Better use of vector instructions Vectorize the FFT routine over F Support for the new AVX-512. van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
35 Perspectives Perspectives Better use of vector instructions Vectorize the FFT routine over F Support for the new AVX-512. Others Use the Truncated Fourier Transform (reduce the staircase effect) Generalization for other finite fields van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
36 Questions? Thank you for your attention van der Hoeven, Larrieu, Lecerf Implementing Fast Carryless Multiplication MACIS / 17
Implementing fast carryless multiplication
Implementing fast carryless multiplication Joris Van Der Hoeven, Robin Larrieu, Grégoire Lecerf To cite this version: Joris Van Der Hoeven, Robin Larrieu, Grégoire Lecerf. Implementing fast carryless multiplication.
More informationEven faster integer multiplication
Even faster integer multiplication DAVID HARVEY School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au JORIS VAN DER HOEVEN a, GRÉGOIRE
More informationFaster integer multiplication using short lattice vectors
Faster integer multiplication using short lattice vectors David Harvey and Joris van der Hoeven ANTS XIII, University of Wisconsin, Madison, July 2018 University of New South Wales / CNRS, École Polytechnique
More informationInteger multiplication with generalized Fermat primes
Integer multiplication with generalized Fermat primes CARAMEL Team, LORIA, University of Lorraine Supervised by: Emmanuel Thomé and Jérémie Detrey Journées nationales du Calcul Formel 2015 (Cluny) November
More informationEven faster integer multiplication
Even faster integer multiplication DAVID HARVEY School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au JORIS VAN DER HOEVEN a, GRÉGOIRE
More informationInteger multiplication and the truncated product problem
Integer multiplication and the truncated product problem David Harvey Arithmetic Geometry, Number Theory, and Computation MIT, August 2018 University of New South Wales Political update from Australia
More informationFast integer multiplication
Fast integer multiplication David Harvey, Joris van der Hoeven, Grégoire Lecerf CNRS, École polytechnique Bordeaux, February 2, 2015 http://www.texmacs.org Fundamental complexities I(n): multiplication
More informationFast algorithms for polynomials and matrices Part 2: polynomial multiplication
Fast algorithms for polynomials and matrices Part 2: polynomial multiplication by Grégoire Lecerf Computer Science Laboratory & CNRS École polytechnique 91128 Palaiseau Cedex France 1 Notation In this
More informationImplementation of the DKSS Algorithm for Multiplication of Large Numbers
Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn The International Symposium on Symbolic and Algebraic Computation, July 6 9, 2015, Bath, United
More informationFaster polynomial multiplication over nite elds
Faster polynomial multiplication over nite elds David Harvey School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au Joris van der Hoeven
More informationThree Ways to Test Irreducibility
Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 12 Feb 2009 Outline Polynomials over finite fields Irreducibility criteria
More informationEven faster integer multiplication
Even faster integer multiplication David Harvey School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia Email: d.harvey@unsw.edu.au Joris van der Hoeven a, Grégoire
More informationThree Ways to Test Irreducibility
Outline Three Ways to Test Irreducibility Richard P. Brent Australian National University joint work with Paul Zimmermann INRIA, Nancy France 8 Dec 2008 Polynomials over finite fields Irreducibility criteria
More informationImplementation of the DKSS Algorithm for Multiplication of Large Numbers
Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn Institut für Informatik Bonn, Germany chris@cfos.de ABSTRACT The Schönhage-Strassen algorithm
More informationAn Illustrated Introduction to the Truncated Fourier Transform
An Illustrated Introduction to the Truncated Fourier Transform arxiv:1602.04562v2 [cs.sc] 17 Feb 2016 Paul Vrbik. School of Mathematical and Physical Sciences The University of Newcastle Callaghan, Australia
More informationarxiv: v2 [cs.sc] 14 Sep 2017
Faster Multiplication for Long Binary Polynomials Ming-Shing Chen 1,3, Chen-Mou Cheng 1, Po-Chun Kuo 1,2, Wen-Ding Li 2, and Bo-Yin Yang 2,3 arxiv:1708.09746v2 [cs.sc] 14 Sep 2017 1 Department of Electrical
More informationFast Polynomial Multiplication over F 2
Fast Polynomial Multiplication over F 2 60 David Harvey School of Mathematics and Statistics University of New South Wales Sydney NSW 2052 Australia d.harvey@unsw.edu.au Joris van der Hoeven, Grégoire
More informationDivide and Conquer algorithms
Divide and Conquer algorithms Another general method for constructing algorithms is given by the Divide and Conquer strategy. We assume that we have a problem with input that can be split into parts in
More informationFast Convolution; Strassen s Method
Fast Convolution; Strassen s Method 1 Fast Convolution reduction to subquadratic time polynomial evaluation at complex roots of unity interpolation via evaluation at complex roots of unity 2 The Master
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More informationInteger multiplication in time O(n log n)
Integer multiplication in time O(n log n) David Harvey, Joris Van Der Hoeven To cite this version: David Harvey, Joris Van Der Hoeven. Integer multiplication in time O(n log n). 2019. HAL
More informationMultiplying huge integers using Fourier transforms
Fourier transforms October 25, 2007 820348901038490238478324 1739423249728934932894??? integers occurs in many fields of Computational Science: Cryptography Number theory... Traditional approaches to
More informationOutput-sensitive algorithms for sumset and sparse polynomial multiplication
Output-sensitive algorithms for sumset and sparse polynomial multiplication Andrew Arnold Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada Daniel S. Roche Computer Science
More informationFast Polynomials Multiplication Using FFT
Li Chen lichen.xd at gmail.com Xidian University January 17, 2014 Outline 1 Discrete Fourier Transform (DFT) 2 Discrete Convolution 3 Fast Fourier Transform (FFT) 4 umber Theoretic Transform (TT) 5 More
More informationFast Multivariate Power Series Multiplication in Characteristic Zero
Fast Multivariate Power Series Multiplication in Characteristic Zero Grégoire Lecerf and Éric Schost Laboratoire GAGE, École polytechnique 91128 Palaiseau, France E-mail: lecerf,schost@gage.polytechnique.fr
More informationarxiv: v1 [cs.sc] 22 Nov 2016
FASTER INTEGER MULTIPLICATION USING PLAIN VANILLA FFT PRIMES arxiv:1611.07144v1 [cs.sc] 22 Nov 2016 DAVID HARVEY AND JORIS VAN DER HOEVEN Abstract. Assuming a conjectural upper bound for the least prime
More informationFast integer multiplication using generalized Fermat primes
Fast integer multiplication using generalized Fermat primes Svyatoslav Covanov, Emmanuel Thomé To cite this version: Svyatoslav Covanov, Emmanuel Thomé. Fast integer multiplication using generalized Fermat
More informationChapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn
Chapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn Formulation of the D&C principle Divide-and-conquer method for solving a problem instance of size n: 1. Divide
More informationPUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) Linxiao Wang. Graduate Program in Computer Science
PUTTING FÜRER ALGORITHM INTO PRACTICE WITH THE BPAS LIBRARY. (Thesis format: Monograph) by Linxiao Wang Graduate Program in Computer Science A thesis submitted in partial fulfillment of the requirements
More information3 Finite fields and integer arithmetic
18.783 Elliptic Curves Lecture #3 Spring 2017 02/15/2017 3 Finite fields and integer arithmetic In order to perform explicit computations with elliptic curves over finite fields, we first need to understand
More information2 The Truncated Fourier Transform and Applications The TFT permits to speed up the multiplication of univariate polynomials with a constant factor bet
The Truncated Fourier Transform and Applications Joris van der Hoeven D pt. de Math matiques (B t. 425) Universit Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org January 9, 2004 In this paper,
More informationarxiv: v1 [cs.ds] 28 Jan 2010
An in-place truncated Fourier transform and applications to polynomial multiplication arxiv:1001.5272v1 [cs.ds] 28 Jan 2010 ABSTRACT David Harvey Courant Institute of Mathematical Sciences New York University
More information5.6 Convolution and FFT
5.6 Convolution and FFT Fast Fourier Transform: Applications Applications. Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression,
More informationUNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. Prof. R. Fateman
UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 282 Spring, 2000 Prof. R. Fateman The (finite field) Fast Fourier Transform 0. Introduction
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 14 Divide and Conquer Fast Fourier Transform Sofya Raskhodnikova 10/7/2016 S. Raskhodnikova; based on slides by K. Wayne. 5.6 Convolution and FFT Fast Fourier Transform:
More informationFast Computation of Power Series Solutions of Systems of Differential Equations
Fast Computation of Power Series Solutions of Systems of Differential Equations Bruno.Salvy@inria.fr Algorithms Project, Inria Joint work with A. Bostan, F. Chyzak, F. Ollivier, É. Schost, A. Sedoglavic
More informationSoftware implementation of Koblitz curves over quadratic fields
Software implementation of Koblitz curves over quadratic fields Thomaz Oliveira 1, Julio López 2 and Francisco Rodríguez-Henríquez 1 1 Computer Science Department, Cinvestav-IPN 2 Institute of Computing,
More informationfeb abhi shelat Matrix, FFT
L7 feb 11 2016 abhi shelat Matrix, FFT userid: = Using the standard method, how many multiplications does it take to multiply two NxN matrices? cos( /4) = cos( /2) = sin( /4) = sin( /2) = Mergesort Karatsuba
More informationFaster Multiplication in GF(2)[x]
Faster Multiplication in GF(2)[x] Richard P. Brent 1, Pierrick Gaudry 2, Emmanuel Thomé 3, and Paul Zimmermann 3 1 Australian National University, Canberra, Australia 2 LORIA/CNRS, Vandœuvre-lès-Nancy,
More informationThe Fast Fourier Transform: A Brief Overview. with Applications. Petros Kondylis. Petros Kondylis. December 4, 2014
December 4, 2014 Timeline Researcher Date Length of Sequence Application CF Gauss 1805 Any Composite Integer Interpolation of orbits of celestial bodies F Carlini 1828 12 Harmonic Analysis of Barometric
More informationDIVIDE AND CONQUER II
DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley http://www.cs.princeton.edu/~wayne/kleinberg-tardos
More informationCSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences
CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer What you really need to know about recurrences Work per level changes geometrically with the level Geometrically increasing (x > 1) The
More informationParallel Integer Polynomial Multiplication Changbo Chen, Svyatoslav Parallel Integer Covanov, Polynomial FarnamMultiplication
Parallel Integer Polynomial Multiplication Parallel Integer Polynomial Multiplication Changbo Chen 1 Svyatoslav Covanov 2,3 Farnam Mansouri 2 Marc Moreno Maza 2 Ning Xie 2 Yuzhen Xie 2 1 Chinese Academy
More informationFast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients
, July 4-6, 01, London, UK Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients Andrzej Chmielowiec Abstract This paper aims to develop and analyze an effective parallel algorithm
More informationDesign and Analysis of Algorithms
Design and Analysis of Algorithms CSE 5311 Lecture 5 Divide and Conquer: Fast Fourier Transform Junzhou Huang, Ph.D. Department of Computer Science and Engineering CSE5311 Design and Analysis of Algorithms
More informationBalanced Dense Polynomial Multiplication on Multicores
Balanced Dense Polynomial Multiplication on Multicores Yuzhen Xie SuperTech Group, CSAIL MIT joint work with Marc Moreno Maza ORCCA, UWO ACA09, Montreal, June 26, 2009 Introduction Motivation: Multicore-enabling
More informationFrequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography
Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Selçuk Baktır, Berk Sunar {selcuk,sunar}@wpi.edu Department of Electrical & Computer Engineering Worcester Polytechnic Institute
More informationCSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )
CSE 548: Analysis of Algorithms Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2015 Coefficient Representation
More informationParallelism in Computer Arithmetic: A Historical Perspective
Parallelism in Computer Arithmetic: A Historical Perspective 21s 2s 199s 198s 197s 196s 195s Behrooz Parhami Aug. 218 Parallelism in Computer Arithmetic Slide 1 University of California, Santa Barbara
More informationCS711008Z Algorithm Design and Analysis
CS711008Z Algorithm Design and Analysis Lecture 5 FFT and Divide and Conquer Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 56 Outline DFT: evaluate a polynomial
More informationSCALED REMAINDER TREES
Draft. Aimed at Math. Comp. SCALED REMAINDER TREES DANIEL J. BERNSTEIN Abstract. It is well known that one can compute U mod p 1, U mod p 2,... in time n(lg n) 2+o(1) where n is the number of bits in U,
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 12 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial,
More informationLarge Integer Multiplication on Hypercubes. Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH
Large Integer Multiplication on Hypercubes Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH 03755 barry.fagin@dartmouth.edu Large Integer Multiplication 1 B. Fagin ABSTRACT Previous
More informationFast Fourier Transform
Why Fourier Transform? Fast Fourier Transform Jordi Cortadella and Jordi Petit Department of Computer Science Polynomials: coefficient representation Divide & Conquer Dept. CS, UPC Polynomials: point-value
More informationFast reversion of formal power series
Fast reversion of formal power series Fredrik Johansson LFANT, INRIA Bordeaux RAIM, 2016-06-29, Banyuls-sur-mer 1 / 30 Reversion of power series F = exp(x) 1 = x + x 2 2! + x 3 3! + x 4 G = log(1 + x)
More informationFast reversion of power series
Fast reversion of power series Fredrik Johansson November 2011 Overview Fast power series arithmetic Fast composition and reversion (Brent and Kung, 1978) A new algorithm for reversion Implementation results
More informationHow to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.
How to ultiply Slides by Kevin Wayne. Copyright 5 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials Complex ultiplication Complex multiplication. a + bi) c + di) = x + yi.
More informationThe tangent FFT. D. J. Bernstein University of Illinois at Chicago
The tangent FFT D. J. Bernstein University of Illinois at Chicago Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography
More informationComputational Methods for Astrophysics: Fourier Transforms
Computational Methods for Astrophysics: Fourier Transforms John T. Whelan (filling in for Joshua Faber) April 27, 2011 John T. Whelan April 27, 2011 Fourier Transforms 1/13 Fourier Analysis Outline: Fourier
More informationBig Prime Field FFT on the GPU
Big Prime Field FFT on the GPU Liangyu Chen, Svyatoslav Covanov, Davood Mohajerani, Marc Moreno Maza To cite this version: Liangyu Chen, Svyatoslav Covanov, Davood Mohajerani, Marc Moreno Maza. Big Prime
More informationFast Matrix Product Algorithms: From Theory To Practice
Introduction and Definitions The τ-theorem Pan s aggregation tables and the τ-theorem Software Implementation Conclusion Fast Matrix Product Algorithms: From Theory To Practice Thomas Sibut-Pinote Inria,
More informationA parallel implementation for polynomial multiplication modulo a prime.
A parallel implementation for polynomial multiplication modulo a prime. ABSTRACT Marshall Law Department of Mathematics Simon Fraser University Burnaby, B.C. Canada. mylaw@sfu.ca. We present a parallel
More informationShor s Algorithm. Polynomial-time Prime Factorization with Quantum Computing. Sourabh Kulkarni October 13th, 2017
Shor s Algorithm Polynomial-time Prime Factorization with Quantum Computing Sourabh Kulkarni October 13th, 2017 Content Church Thesis Prime Numbers and Cryptography Overview of Shor s Algorithm Implementation
More informationFast composition of numeric power series
Fast composition of numeric power series Joris van der Hoeven CNRS, Département de Mathématiques Bâtiment 425 Université Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org Web: http://www.math.u-psud.fr/~vdhoeven
More informationFast multiplication and its applications
Algorithmic Number Theory MSRI Publications Volume 44, 2008 Fast multiplication and its applications DANIEL J. BERNSTEIN ABSTRACT. This survey explains how some useful arithmetic operations can be sped
More informationOn the computational complexity of mathematical functions
On the computational complexity of mathematical functions Jean-Pierre Demailly Institut Fourier, Université de Grenoble I & Académie des Sciences, Paris (France) November 26, 2011 KVPY conference at Vijyoshi
More informationSpeeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago
Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases D. J. Bernstein University of Illinois at Chicago NSF ITR 0716498 Part I. Linear maps Consider computing 0
More informationNewton s method and FFT trading
Newton s method and FFT trading Joris van der Hoeven Dépt. de Mathématiques (Bât. 425) CNRS, Université Paris-Sud 91405 Orsay Cedex France Email: joris@texmacs.org December 9, 2008 Let C[[z]] be the ring
More information4.3 The Discrete Fourier Transform (DFT) and the Fast Fourier Transform (FFT)
CHAPTER. TIME-FREQUECY AALYSIS: FOURIER TRASFORMS AD WAVELETS.3 The Discrete Fourier Transform (DFT and the Fast Fourier Transform (FFT.3.1 Introduction In this section, we discuss some of the mathematics
More informationToward High Performance Matrix Multiplication for Exact Computation
Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations
More informationSchönhage-Strassen Algorithm with MapReduce for Multiplying Terabit Integers (April 29, 2011)
Schönhage-Strassen Algorithm with MapReduce for Multiplying Terabit Integers (April 29, 2011) Tsz-Wo Sze Yahoo! Cloud Platform 701 First Avenue Sunnyvale, CA 94089, USA tsz@yahoo-inc.com ABSTRACT We present
More informationFFT: Fast Polynomial Multiplications
FFT: Fast Polynomial Multiplications Jie Wang University of Massachusetts Lowell Department of Computer Science J. Wang (UMass Lowell) FFT: Fast Polynomial Multiplications 1 / 20 Overview So far we have
More informationComputing Characteristic Polynomials of Matrices of Structured Polynomials
Computing Characteristic Polynomials of Matrices of Structured Polynomials Marshall Law and Michael Monagan Department of Mathematics Simon Fraser University Burnaby, British Columbia, Canada mylaw@sfu.ca
More informationElliptic Curves Spring 2013 Lecture #3 02/12/2013
18.783 Elliptic Curves Spring 2013 Lecture #3 02/12/2013 3.1 Arithmetic in finite fields To make explicit computations with elliptic curves over finite fields, we need to know how to perform arithmetic
More informationAlgorithms and data structures
Algorithms and data structures Amin Coja-Oghlan LFCS Complex numbers Roots of polynomials A polynomial of degree d is a function of the form p(x) = d a i x i with a d 0. i=0 There are at most d numbers
More informationLecture 20: Discrete Fourier Transform and FFT
EE518 Digital Signal Processing University of Washington Autumn 2001 Dept of Electrical Engineering Lecture 20: Discrete Fourier Transform and FFT Dec 10, 2001 Prof: J Bilmes TA:
More informationSmoothness Testing of Polynomials over Finite Fields
Smoothness Testing of Polynomials over Finite Fields Jean-François Biasse and Michael J. Jacobson Jr. Department of Computer Science, University of Calgary 2500 University Drive NW Calgary, Alberta, Canada
More informationAppendix C: Recapitulation of Numerical schemes
Appendix C: Recapitulation of Numerical schemes August 31, 2009) SUMMARY: Certain numerical schemes of general use are regrouped here in order to facilitate implementations of simple models C1 The tridiagonal
More informationLiterature Review: Adaptive Polynomial Multiplication
Literature Review: Adaptive Polynomial Multiplication Daniel S. Roche November 27, 2007 While output-sensitive algorithms have gained a fair amount of popularity in the computer algebra community, adaptive
More informationMid-term Exam Answers and Final Exam Study Guide CIS 675 Summer 2010
Mid-term Exam Answers and Final Exam Study Guide CIS 675 Summer 2010 Midterm Problem 1: Recall that for two functions g : N N + and h : N N +, h = Θ(g) iff for some positive integer N and positive real
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2017) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
More informationFaster polynomial multiplication over finite fields
Faster polynomial multiplication over finite fields David Harvey, Joris Van Der Hoeven, Grégoire Lecerf To cite this version: David Harvey, Joris Van Der Hoeven, Grégoire Lecerf. Faster polynomial multiplication
More informationRSA Implementation. Oregon State University
RSA Implementation Çetin Kaya Koç Oregon State University 1 Contents: Exponentiation heuristics Multiplication algorithms Computation of GCD and Inverse Chinese remainder algorithm Primality testing 2
More informationAlgorithms for exact (dense) linear algebra
Algorithms for exact (dense) linear algebra Gilles Villard CNRS, Laboratoire LIP ENS Lyon Montagnac-Montpezat, June 3, 2005 Introduction Problem: Study of complexity estimates for basic problems in exact
More informationExact Arithmetic on a Computer
Exact Arithmetic on a Computer Symbolic Computation and Computer Algebra William J. Turner Department of Mathematics & Computer Science Wabash College Crawfordsville, IN 47933 Tuesday 21 September 2010
More informationFast Polynomial Multiplication
Fast Polynomial Multiplication Marc Moreno Maza CS 9652, October 4, 2017 Plan Primitive roots of unity The discrete Fourier transform Convolution of polynomials The fast Fourier transform Fast convolution
More informationMcBits: Fast code-based cryptography
McBits: Fast code-based cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands Joint work with Daniel Bernstein, Tung Chou December 17, 2013 IMA International Conference on Cryptography
More informationCPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication
CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform
More informationThe Fast Fourier Transform. Andreas Klappenecker
The Fast Fourier Transform Andreas Klappenecker Motivation There are few algorithms that had more impact on modern society than the fast Fourier transform and its relatives. The applications of the fast
More informationarxiv: v1 [cs.na] 8 Feb 2016
Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication
More informationBalanced dense polynomial multiplication on multi-cores
Balanced dense polynomial multiplication on multi-cores The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher
More informationSome long-period random number generators using shifts and xors
ANZIAM J. 48 (CTAC2006) pp.c188 C202, 2007 C188 Some long-period random number generators using shifts and xors Richard P. Brent 1 (Received 6 July 2006; revised 2 July 2007) Abstract Marsaglia recently
More informationReview: Linear and Vector Algebra
Review: Linear and Vector Algebra Points in Euclidean Space Location in space Tuple of n coordinates x, y, z, etc Cannot be added or multiplied together Vectors: Arrows in Space Vectors are point changes
More informationCS 4424 Matrix multiplication
CS 4424 Matrix multiplication 1 Reminder: matrix multiplication Matrix-matrix product. Starting from a 1,1 a 1,n A =.. and B = a n,1 a n,n b 1,1 b 1,n.., b n,1 b n,n we get AB by multiplying A by all columns
More informationA heuristic quasi-polynomial algorithm for discrete logarithm in small characteristic
ECC, Chennai October 8, 2014 A heuristic quasi-polynomial algorithm for discrete logarithm in small characteristic Razvan Barbulescu 1 Pierrick Gaudry 2 Antoine Joux 3 Emmanuel Thomé 2 IMJ-PRG, Paris Loria,
More informationArithmetic Operators for Pairing-Based Cryptography
Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,
More informationSparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ.
Sparsity Matters Robert J. Vanderbei 2017 September 20 http://www.princeton.edu/ rvdb IDA: Center for Communications Research Princeton NJ The simplex method is 200 times faster... The simplex method is
More informationMA3232 Numerical Analysis Week 9. James Cooley (1926-)
MA umerical Analysis Week 9 James Cooley (96-) James Cooley is an American mathematician. His most significant contribution to the world of mathematics and digital signal processing is the Fast Fourier
More informationLucas Lehmer primality test - Wikipedia, the free encyclopedia
Lucas Lehmer primality test From Wikipedia, the free encyclopedia In mathematics, the Lucas Lehmer test (LLT) is a primality test for Mersenne numbers. The test was originally developed by Edouard Lucas
More informationfeb abhi shelat FFT,Median
L8 feb 16 2016 abhi shelat FFT,Median merge-sort (A, p, r) if pn B[k] A[i];
More information