Fast reversion of formal power series

Similar documents
Fast reversion of power series

CS 4424 Matrix multiplication

Introduction to Algorithms

Fast Polynomial Multiplication

Three Ways to Test Irreducibility

CSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )

Fast Computation of Power Series Solutions of Systems of Differential Equations

Three Ways to Test Irreducibility

Chapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn

Chapter 2. Divide-and-conquer. 2.1 Strassen s algorithm

Matrix Multiplication

Design and Analysis of Algorithms

Powers of Tensors and Fast Matrix Multiplication

Divide and Conquer: Polynomial Multiplication Version of October 1 / 7, 24201

Divide and Conquer. Andreas Klappenecker. [based on slides by Prof. Welch]

Polynomial evaluation and interpolation on special sets of points

Fast Convolution; Strassen s Method

Elliptic Curves Spring 2013 Lecture #3 02/12/2013

How to Multiply. 5.5 Integer Multiplication. Complex Multiplication. Integer Arithmetic. Complex multiplication. (a + bi) (c + di) = x + yi.

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

1 Solutions to selected problems

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.

CSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

CS 829 Polynomial systems: geometry and algorithms Lecture 3: Euclid, resultant and 2 2 systems Éric Schost

Complexity of Matrix Multiplication and Bilinear Problems

Divide and conquer. Philip II of Macedon

MULTI-RESTRAINED STIRLING NUMBERS

On the Exponent of the All Pairs Shortest Path Problem

Addition sequences and numerical evaluation of modular forms

CS/COE 1501 cs.pitt.edu/~bill/1501/ Integer Multiplication

Fast algorithms for polynomials and matrices Part 2: polynomial multiplication

1.1 Administrative Stuff

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Chapter 1 Divide and Conquer Algorithm Theory WS 2016/17 Fabian Kuhn

How to find good starting tensors for matrix multiplication

Ma/CS 6b Class 12: Graphs and Matrices

Linear recurrences with polynomial coefficients and application to integer factorization and Cartier-Manin operator

Implementation of the DKSS Algorithm for Multiplication of Large Numbers

Stirling Numbers of the 1st Kind

SCALED REMAINDER TREES

Integer multiplication with generalized Fermat primes

DIVIDE AND CONQUER II

Chapter 22 Fast Matrix Multiplication

Discovering Fast Matrix Multiplication Algorithms via Tensor Decomposition

Speedy Maths. David McQuillan

Symmetric functions and the Vandermonde matrix

Evaluating Determinants by Row Reduction

Chapter 2. Recurrence Relations. Divide and Conquer. Divide and Conquer Strategy. Another Example: Merge Sort. Merge Sort Example. Merge Sort Example

Section 3: Complex numbers

Taking Roots over High Extensions of Finite Fields

1 Solutions to selected problems

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

Rings If R is a commutative ring, a zero divisor is a nonzero element x such that xy = 0 for some nonzero element y R.

Numerics of classical elliptic functions, elliptic integrals and modular forms

Dense Arithmetic over Finite Fields with CUMODP

GATE Engineering Mathematics SAMPLE STUDY MATERIAL. Postal Correspondence Course GATE. Engineering. Mathematics GATE ENGINEERING MATHEMATICS

Toward High Performance Matrix Multiplication for Exact Computation

Determinants. Samy Tindel. Purdue University. Differential equations and linear algebra - MA 262

Algorithm Design and Analysis

Inversion Modulo Zero-dimensional Regular Chains

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: October 11. In-Class Midterm. ( 7:05 PM 8:20 PM : 75 Minutes )

An introduction to parallel algorithms

CPSC 518 Introduction to Computer Algebra Asymptotically Fast Integer Multiplication

Fast algorithms for polynomials and matrices. Alin Bostan. Algorithms Project, INRIA

Fast Matrix Product Algorithms: From Theory To Practice

Fast Algorithms for Polynomial Solutions of Linear Differential Equations

MATRIX DETERMINANTS. 1 Reminder Definition and components of a matrix

The tangent FFT. D. J. Bernstein University of Illinois at Chicago

Mathematics 136 Calculus 2 Everything You Need Or Want To Know About Partial Fractions (and maybe more!) October 19 and 21, 2016

MATH Topics in Applied Mathematics Lecture 12: Evaluation of determinants. Cross product.

Algorithms Design & Analysis. Dynamic Programming

1 Fields and vector spaces

SOLVED PROBLEMS ON TAYLOR AND MACLAURIN SERIES

6.S897 Algebra and Computation February 27, Lecture 6

An Introduction to Differential Algebra

CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication

14. Properties of the Determinant

Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases

Fast multiplication and its applications

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

3 Matrix Algebra. 3.1 Operations on matrices

Counting points on hyperelliptic curves


On the computational complexity of mathematical functions

Operations On Networks Of Discrete And Generalized Conductors

Efficient implementation of the Hardy-Ramanujan-Rademacher formula

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

Exact Arithmetic on a Computer

Design and Analysis of Algorithms

Fast Polynomial Factorization Over High Algebraic Extensions of Finite Fields

Chapter 5 Divide and Conquer

Math 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS

Group-theoretic approach to Fast Matrix Multiplication

On the use of polynomial matrix approximant in the block Wiedemann algorithm

OR MSc Maths Revision Course

ALGEBRA+NUMBER THEORY +COMBINATORICS

ACM106a - Homework 2 Solutions

CSE 421 Algorithms: Divide and Conquer

Pre-calculus 12 Curriculum Outcomes Framework (110 hours)

Transcription:

Fast reversion of formal power series Fredrik Johansson LFANT, INRIA Bordeaux RAIM, 2016-06-29, Banyuls-sur-mer 1 / 30

Reversion of power series F = exp(x) 1 = x + x 2 2! + x 3 3! + x 4 G = log(1 + x) = x x 2 2 + x 3 3 x 4 4! +... 4 +... F (G(x)) = G(F (x)) = x, G = F 1 and F = G 1 Applications: Combinatorics, recurrences (example: counting binary trees) Numerical approximation of F 1 Asymptotic analysis 2 / 30

Operations in R[[x]]/ x n n 1 F = F (x) = f k x k, Addition k=0 Multiplication [x k ]F (x) = f k R F + G = n 1 k=0 (f k + g k )x k FG = n 1 k=0 ( k i=0 f ig k i ) x k Reciprocal If f 0 is a unit, find G such that FG = 1 Composition Reversion If g 0 = 0, F (G) = n 1 k=0 f kg k If f 0 = 0 and f 1 is a unit, find G s.t. F (G) = x 3 / 30

Computational complexity Counting operations in the base ring R. M(n): multiplying two length-n polynomials Classical M(n) = O(n 2 ) Karatsuba M(n) = O(n log 2 (3) ) = O(n 1.59 ) Fast Fourier Transform M(n) = O(n log 1+o(1) n) = O(n 1+o(1) ) MM(n) = O(n ω ): multiplying two n n matrices Classical MM(n) = O(n 3 ) Strassen MM(n) = O(n log 2 (7) ) = O(n 2.81 ) Coppersmith-Winograd-Le Gall MM(n) = O(n 2.3728639 ) 4 / 30

Newton iteration for power series If ϕ(g k ) = 0 mod x n, then (under natural assumptions) ϕ(g k+1 ) = 0 mod x 2n, G k+1 = G k ϕ(g k) ϕ (G k ) O(M(n)) computation of 1/F : take ϕ(g) = 1 G F, giving: G k+1 = 2G k FG 2 k. 5 / 30

Reversion using Newton iteration Given F = f 0 + f 1 x +... with f 0 = 0 and f 1 invertible. Take ϕ(g) = F (G) x, giving: G k+1 = G k F (G) x F. (G) Brent and Kung, 1978: up to constant factors: composition reversion 6 / 30

Fast composition: elementary functions Assume that 1, 2,..., n 1 are invertible in R. n 2 F = (k + 1)f k+1 x k, k=0 log(1 + F ) = n 1 ( ) 1 F = f k 1 x k k k=1 F 1 + F, atan(f ) = F 1 + F 2 Newton iteration: log exp, atan tan. sin, cos, sinh, cosh, asin... using algebraic transformations. Cost: O(M(n)) = O(n 1+o(1) ). Generalization: differential equations (hypergeometric,...) 7 / 30

Composition of general power series Horner s rule F (G) = (... (f n 1 G + f n 2 ) G +... + f 1 ) G + f 0 Complexity: O(nM(n)) Brent-Kung 2.1: baby-step, giant-step version of Horner s rule Complexity: O(n 1/2 M(n) + n 1/2 MM(n 1/2 )) Brent-Kung 2.2: divide-and-conquer Taylor expansion Complexity: O((n log n) 1/2 M(n)) 8 / 30

Brent-Kung algorithm 2.1 Example with n = 9, m = n = 3 Computing F (G) where F = f 0 + f 1 x +... + f 8 x 8 (f 0 + f 1 G + f 2 G 2 ) + (f 3 + f 4 G + f 5 G 2 )G 3 + (f 6 + f 7 G + f 8 G 2 )G 6 (m m) (m m 2 ) matrix multiplication: f 0 f 1 f 2 1 f 3 f 4 f 5 G f 6 f 7 f 8 G 2 Final combination using Horner s rule. 9 / 30

Complexity of composition algorithms Horner s rule: O(nM(n)) BK 2.1: O(n 1/2 M(n) + n 1/2 MM(n 1/2 )) BK 2.2: O((n log n) 1/2 M(n)) = O(n 3/2+o(1) ) M(n) MM(n) Horner BK 2.1 BK 2.2 n 2 n 3 n 3 n 2.5 n 2.5 log 0.5 n n 1.59 n 3 n 2.59 n 2.09 n 2.09 log 0.5 n n log n n 3 n 2 log n n 2 n 1.5 log 1.5 n n log n n 2.3728639 n 2 log n n 1.687 n 1.5 log 1.5 n (n log n) (n 2 ) (n 2 log n) (n 1.5 log n) (n 1.5 log 1.5 n) BK 2.1 wins for small n, BK 2.2 wins for large n BK 2.1 wins with hypothetical O(n 2 ) matrix multiplication 10 / 30

The Lagrange inversion theorem The coefficients of F 1 are given by [x k ]F 1 = 1 ( x ) k k [x k 1 ]. F O(nM(n)) algorithm: compute H = x/f using Newton iteration for the reciprocal, then compute H, H 2, H 3,..., H n 1 Better than classical algorithms (O(n 3 )) Similar to Newton iteration + Horner s rule Worse than Newton iteration + BK2.1 or BK2.2 11 / 30

Fast Lagrange inversion We can extract a single coefficient in H a+b = H a H b using O(n) operations: [x k ]H a+b = k ( [x i ]H a) ([x k i ]H b) i=0 Compute H, H 2,..., H m 1, then H m, H 2m, H 3m,..., H n/m m. Choose m n need O( n) polynomial multiplications, plus O(n 2 ) coefficient operations. 12 / 30

Fast Lagrange inversion, first version 1: m n 2: H x/f 3: for 1 i < m do 4: H i+1 H i H 5: b i 1 i [x i 1 ]H i 6: end for 7: T H m 8: for i = m, 2m, 3m,..., lm < n do 9: b i 1 i [x i 1 ]T 10: for 1 j < m while i + j < n do 11: b i+j 1 i+j 12: end for 13: T T H m i+j 1 k=0 ([x k ]T ) ([x i+j k 1 ]H j ) 14: end for 15: return b 1 x + b 2 x 2 +... + b n 1 x n 1 13 / 30

Using matrix multiplication Example: n = 8, m = 8 1 = 3. We want the coefficients b 1, b 2,..., b 7 of 1, x,..., x 6 in the respective powers of H. h k,i = [x i ]H k h 0,2 h 0,1 h 0,0 0 0 0 0 0 0 A = h 3,5 h 3,4 h 3,3 h 3,2 h 3,1 h 3,0 0 0 0 h 6,6 h 6,5 h 6,4 h 6,3 h 6,2 h 6,1 h 6,0 0 0 h 1,0 h 1,1 h 1,2 h 1,3 h 1,4 h 1,5 h 1,6 B = 0 h 2,0 h 2,1 h 2,2 h 2,3 h 2,4 h 2,5 h 2,6 h 3,0 h 3,1 h 3,2 h 3,3 h 3,4 h 3,5 h 3,6 b 1 b 2 b 3 AB T = b 4 b 5 b 6 b 7 14 / 30

Fast Lagrange inversion, matrix version 1: m n 1 2: H x/f 3: {Assemble m m 2 matrices B and A from H, H 2,..., H m and H m, H 2m, H 3m,...} 4: for 1 i m, 1 j m 2 do 5: B i,j [x i+j m 1 ] H i 6: A i,j [x im j ] H (i 1)m 7: end for 8: C AB T 9: for 1 i < n do 10: b i C i /i (C i is the ith entry of C read rowwise) 11: end for 12: return b 1 x + b 2 x 2 +... + b n 1 x n 1 15 / 30

Comparison with BK 2.1 Let m = n. Fast Lagrange inversion for reversion: 1. 2m + O(1) polynomial multiplications 2. One (m m 2 ) times (m 2 m) matrix multiplication 3. O(n) additional operations Fast Horner s rule (BK 2.1) for composition: 1. m polynomial multiplications, each with cost M(n) 2. One (m m) times (m m 2 ) matrix multiplication 3. m polynomial multiplications and additions Both O(n 1/2 (M(n) + MM(n 1/2 ))), same constant factor. 16 / 30

Speedup by avoiding Newton iteration If composition costs C(n) cn r, reversion via Newton costs ( ) 2 C(n) + C(n/2) + C(n/4) +... = cn r r 2 r 1 speedup 2.0 1.8 1.6 1.4 1.2 2 3 4 5 r Classical polynomial multiplication: 4 31 (8 + 2) 1.214 FFT polynomial multiplication, classical matrix multiplication: 4/3 17 / 30

Faster matrix multiplication Brent-Kung 2.1: Fast Lagrange inversion: = = Can we exploit the structure of A when ω < 3? 18 / 30

Faster matrix multiplication, first algorithm Decompose each m m block of A into (m/k) (m/k) blocks. Asymptotic speedup s: s = m ω+1 ( ) m k=1 k m k+1 k ( ) 2 m ω > k 1 k=0 2k 1 2 kω = 2 2 2 ω > 1 19 / 30

Faster matrix multiplication, second algorithm Write AB T = (AP)(P 1 B T ) where P is a permutation matrix that makes each m m block in A lower triangular. Recursive triangular multiplication: R(k) = 4R(k/2) + 2(k/2) ω + O(k 2 ) Speedup: s = 2 ω 1 2. Note: s > 1 when ω > log 2 6 2.585 20 / 30

Matrix multiplication speedup 2.0 1.5 Speedup 1.0 0.5 Alg.1 Alg.2 0.0 2.0 2.2 2.4 2.6 2.8 3.0 ω Crossover at ω = 1 + log 2 (2 + 2) 2.771 s = 2 with classical matrix multiplication (ω = 3) s = 1.5 with Strassen multiplication s = 1.228 with Coppersmith-Winograd-Le Gall 21 / 30

Using fast rectangular matrix multiplication MM(x, y, z): (x y) by (y z) matrix multiplication Huang and Pan (1998): MM(m, m, m 2 ) = O(n 1.667 ) < mmm(m, m, m) = O(n 1.687 ) By a transposition argument, MM(m, m 2, m) = (1 + o(1))mm(m, m, m 2 ) Open problem: can we get a speedup > 1? 22 / 30

Total speedup Theoretical speedup of fast Lagrange inversion over BK 2.1 by both avoiding Newton iteration and speeding up matrix multiplication. Dominant operation Complexity Newton Matrix Total Polynomial, classical O(n 5/2 ) 1.214 1 1.214 Polynomial, Karatsuba O(n 1/2+log 2 3 ) 1.308 1 1.308 Matrix, classical O(n 2 ) 1.333 2.000 2.666 Matrix, Strassen O(n (1+log 2 7)/2 ) 1.364 1.500 2.047 Matrix, Cop.-Win.-LG O(n 1.687 ) 1.451 1.228 1.781 Matrix, Huang-Pan O(n 1.667 ) 1.458 1? 1.458? (Polynomial, FFT) O(n 3/2 log 1+o(1) n) 1.546 1 1.546 23 / 30

Implementation Composition and reversion implemented in FLINT: Z/pZ, p < 2 64 Z Q In Arb (interval coefficients): R C 24 / 30

Reversion over Z/pZ (p = 2 63 + 29) n Lagrange BK 2.1 BK 2.2 Fast Lagrange 10 10 10 6 10 10 6 18 10 6 6.1 10 6 10 2 0.0028 0.00092 0.0016 0.00054 10 3 0.690 0.066 0.120 0.045 10 4 110 3.3 (8%) 7.1 2.1 10 5 12100 144 (20%) 315 85 10 6 1900000 8251 (28%) 15131 4832 Time in seconds. (Relative time spent on matrix multiplication.) 25 / 30

Reversion over Z/pZ (p = 2 63 + 29) Relative time 8 7 6 5 4 3 2 Lagrange BK 2.2 BK 2.1 Fast Lagrange 1 0 10 1 10 2 10 3 10 4 10 5 10 6 n 26 / 30

Reversion over Z F 1 = k 1 k!x k, F 2 = x, F 3 = x + x 2 1 4x 1 + x + x 2 n BK 2.1 Fast Lagrange F 1 F 2 F 3 F 1 F 2 F 3 10 10 10 6 10 10 6 10 10 6 4.9 10 6 4.3 10 6 4.1 10 6 10 2 0.0078 0.0021 0.0021 0.0064 0.00096 0.00065 10 3 10 1.1 0.96 7.1 0.71 0.22 10 4 24356 1453 538 8903 426 152 Time in seconds. 27 / 30

Reversion over Q F 1 = exp(x) 1, F 2 = x exp(x), F 3 = 3x(1 x 2 ) 2(1 x + x 2 ) 2 n BK 2.1 Fast Lagrange F 1 F 2 F 3 F 1 F 2 F 3 10 31 10 6 28 10 6 28 10 6 11 10 6 11 10 6 9.1 10 6 10 2 0.012 0.021 0.0088 0.0089 0.0081 0.0019 10 3 8.8 17 3.1 14 13 0.65 10 4 13812 27057 1990 35633 27823 784 Time in seconds. 28 / 30

Reversion over R F 1 = exp(x) 1, F 2 = log(1 + x), F 3 = Γ(1 + x) 1 n BK 2.1 Fast Lagrange F 1 F 2 F 3 F 1 F 2 F 3 10 26 10 6 24 10 6 52 10 6 17 10 6 17 10 6 39 10 6 10 2 0.0042 0.059 0.0049 0.0031 0.044 0.0031 10 3 0.79 91 0.53 0.41 61 0.33 10 4 154 207 364 74 Precision doubled until the coefficient of x n 1 is a precise interval. Time in seconds. 29 / 30

Conclusion and questions Reversion algorithm analogous to BK 2.1, avoiding Newton iteration Usually the fastest algorithm for reversion in practice. Is there a Newton-free analog of BK 2.2 for reversion? Can more be said about the structured matrix products? Can denominators be handled better? Numerical stability? Note: the algorithm has been published in FJ, A fast algorithm for reversion of power series, Math. Comp. 84, 475 484, 2015. Some more comments in my PhD thesis. 30 / 30