The tangent FFT. D. J. Bernstein University of Illinois at Chicago

Size: px

Start display at page:

Download "The tangent FFT. D. J. Bernstein University of Illinois at Chicago"

Mae Hawkins
5 years ago
Views:

1 The tangent FFT D. J. Bernstein University of Illinois at Chicago

2 Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography and public-key cryptography. Amsterdam, June 11 12,

3 The convolution problem How quickly can we multiply polynomials in the ring R[Ü]? Answer depends on degrees, representation of polynomials, number of polynomials, etc. Answer also depends on definition of quickly. Many models of computation; many interesting cost measures.

4 Assume two inputs ¾ R[Ü], deg Ñ, deg Ò Ñ, so deg Ò. Assume represented as coeff sequences. How quickly can we compute the Ò coeffs of, given s Ñ coeffs and s Ò Ñ + 1 coeffs? Inputs ( 0 1 Ñ 1) ¾ R Ñ, ( 0 1 Ò Ñ) ¾ R Ò Ñ+1. Output ( 0 1 Ò 1) ¾ R Ò with Ü+ + Ò 1Ü Ò 1 = ( Ü + )( Ü + ).

5 Assume R-algebraic algorithms (without divs, branches): chains of binary R-adds Ù Ú Ù + Ú, binary R-subs Ù Ú Ù Ú, binary R-mults Ù Ú ÙÚ, starting from inputs and constants. Example (1963 Karatsuba):

6 Cost measure for this talk: total R-algebraic complexity. Cost 1 for binary R-add; cost 1 for binary R-sub; cost 1 for binary R-mult; cost 0 for constant in R. Many real-world computations use (e.g.) Pentium M s floating-point operations to approximate operations in R. Properly scheduled operations achieve Pentium M cycles total R-algebraic complexity.

7 Fast Fourier transforms Define Ò ¾ C as exp(2 Ò). Define Ì : C[Ü] (Ü Ò 1) C Ò as (1) ( Ò ) ( Ò 1 Ò ). Can very quickly compute Ì. First publication of fast algorithm: 1866 Gauss. Easy to see that Gauss s FFT uses Ç(Ò lg Ò) arithmetic operations if Ò ¾ Several subsequent reinventions, ending with 1965 Cooley Tukey.

8 Inverse map is also very fast. Multiplication in C Ò is very fast Sande, 1966 Stockham: Can very quickly multiply in C[Ü] (Ü Ò 1) or C[Ü] or R[Ü] by mapping C[Ü] (Ü Ò 1) to C Ò. Given ¾ C[Ü] (Ü Ò 1): compute as Ì 1 (Ì( )Ì( )). Given ¾ C[Ü] with deg Ò: compute from its image in C[Ü] (Ü Ò 1). R-algebraic complexity Ç(Ò lg Ò).

9 A closer look at costs More precise analysis of Gauss FFT (and Cooley-Tukey FFT): C[Ü] (Ü Ò 1) C Ò using (1 2)Ò lg Ò binary C-adds, (1 2)Ò lg Ò binary C-subs, (1 2)Ò lg Ò binary C-mults, if Ò ¾ ( ) ¾ R 2 represents + ¾ C. C-add, C-sub, C-mult cost 2 2 6: ( ) + ( ) = ( + + ), ( ) ( ) = ( ), ( )( ) = ( + ).

10 Total cost 5Ò lg Ò. Easily save some time: eliminate mults by 1; absorb mults by 1 into subsequent operations; simplify mults by Ô using, e.g., ( )(1 Ô 2 1 Ô 2) = (( ) Ô 2 ( + ) Ô 2). Cost 5Ò lg Ò 10Ò + 16 to map C[Ü] (Ü Ò 1) C Ò, if Ò ¾

11 What about cost of convolution? 5Ò lg Ò + Ç(Ò) to compute Ì( ), 5Ò lg Ò + Ç(Ò) to compute Ì( ), Ç(Ò) to multiply in C Ò, similar 5Ò lg Ò + Ç(Ò) for Ì 1. Total cost 15Ò lg Ò + Ç(Ò) to compute ¾ C[Ü] (Ü Ò 1) given ¾ C[Ü] (Ü Ò 1). Total cost (15 2)Ò lg Ò + Ç(Ò) to compute ¾ R[Ü] (Ü Ò 1) given ¾ R[Ü] (Ü Ò 1): map R[Ü] (Ü Ò 1) R 2 C Ò 2 1 (Gauss) to save half the time.

12 1968 R. Yavne: Can do better! Cost 4Ò lg Ò 6Ò + 8 to map C[Ü] (Ü Ò 1) C Ò, if Ò ¾

13 1968 R. Yavne: Can do better! Cost 4Ò lg Ò 6Ò + 8 to map C[Ü] (Ü Ò 1) C Ò, if Ò ¾ James Van Buskirk: Can do better! Cost (34 9)Ò lg Ò + Ç(Ò). Expositions of the new algorithm: Frigo, Johnson, in IEEE Trans. Signal Processing; Lundy, Van Buskirk, in Computing; Bernstein, this talk, expanding an old idea of Fiduccia.

14 Van Buskirk, comp.arch, January 2005: Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib?

15 Van Buskirk, comp.arch, January 2005: Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib? Bernstein, comp.arch: What do you mean, better opcounts? The algebraic complexity of a size-2 complex DFT has stood at (3 3)2 +4 additions and ( 3)2 +4 multiplications since 1968.

16 Van Buskirk, comp.arch, January 2005: Have you ever considered changing djbfft to get better opcounts along the lines of home.comcast.net/~kmbtib? Bernstein, comp.arch: What do you mean, better opcounts? The algebraic complexity of a size-2 complex DFT has stood at (3 3)2 +4 additions and ( 3)2 +4 multiplications since Van Buskirk, comp.arch: Oh, you re so 20th century, Dan. Read the handwriting on the wall.

17 Understanding the FFT The FFT trick: C[Ü] (Ü Ò 1) C[Ü] (Ü Ò 2 1) C[Ü] (Ü Ò 2 +1) by unique C[Ü]-algebra morphism. Cost 2Ò: Ò 2 C-adds, Ò 2 C-subs. e.g. Ò = 4: C[Ü] (Ü 4 1) C[Ü] (Ü 2 1) C[Ü] (Ü 2 + 1) by Ü + 2 Ü Ü 3 ( ) + ( )Ü ( 0 2 ) + ( 1 3 )Ü. Representation: ( ) (( ) ( )) (( 0 2 ) ( 1 3 )).

18 Recurse: C[Ü] (Ü Ò 2 1) C[Ü] (Ü Ò 4 1) C[Ü] (Ü Ò 4 +1); similarly C[Ü] (Ü Ò 2 + 1) C[Ü] (Ü Ò 4 ) C[Ü] (Ü Ò 4 + ); continue to C[Ü] (Ü 1). General case: C[Ü] (Ü Ò «2 ) C[Ü] (Ü Ò 2 «) C[Ü] (Ü Ò 2 +«) by Ò 2 Ü Ò 2 + ( 0 + «Ò 2 ) + ( 1 + «)Ü +, ( 0 «Ò 2 ) + ( 1 «)Ü +. Cost 5Ò: Ò 2 C-mults, Ò 2 C-adds, Ò 2 C-subs. Recurse, eliminate easy mults. Cost 5Ò lg Ò 10Ò + 16.

19 Alternative: the twisted FFT After previous C[Ü] (Ü Ò 1) C[Ü] (Ü Ò 2 1) C[Ü] (Ü Ò 2 +1), apply unique C-algebra morphism C[Ü] (Ü Ò 2 +1) C[Ý] (Ý Ò 2 1) that maps Ü to Ò Ý Ü+ + Ò 2 Ü Ò 2 + ( 0 + Ò 2 )+( 1 + Ò 2+1 )Ü+, ( 0 Ò 2 )+ Ò ( 1 Ò 2+1 )Ý+. Again cost 5Ò: Ò 2 C-mults, Ò 2 C-adds, Ò 2 C-subs. Eliminate easy mults, recurse. Cost 5Ò lg Ò 10Ò + 16.

20 The split-radix FFT FFT and twisted FFT end up with same number of mults by Ò, same number of mults by Ò 2, same number of mults by Ò 4, etc. Is this necessary? No! Split-radix FFT: more easy mults. Don t twist until you see the whites of their s. (Can use same idea to speed up Schönhage-Strassen algorithm for integer multiplication.)

21 Cost 2Ò: C[Ü] (Ü Ò 1) C[Ü] (Ü Ò 2 1) C[Ü] (Ü Ò 2 +1). Cost Ò: C[Ü] (Ü Ò 2 + 1) C[Ü] (Ü Ò 4 ) C[Ü] (Ü Ò 4 + ). Cost 6(Ò 4): C[Ü] (Ü Ò 4 ) C[Ý] (Ý Ò 4 1) by Ü Ò Ý. Cost 6(Ò 4): C[Ü] (Ü Ò 4 + ) C[Ý] (Ý Ò 4 1) by Ü 1 Ò Ý. Overall cost 6Ò to split into , entropy 1 5 bits. Eliminate easy mults, recurse. Cost 4Ò lg Ò 6Ò + 8, exactly as in 1968 Yavne.

22 The tangent FFT Several ways to achieve cost 6 for mult by. One approach: Factor as (1 + tan ) cos. Cost 2 for mult by cos. Cost 4 for mult by 1 + tan. For stability and symmetry, use max cos sin instead of cos. Surprise (Van Buskirk): Can merge some cost-2 mults!

23 Rethink basis of C[Ü] (Ü Ò 1). Instead of 1 Ü Ü Ò 1 use 1 Ò 0 Ü Ò 1 Ü Ò 1 Ò Ò 1 where Ò = max cos 2 Ò max cos 2 max. Ò 4 cos 2 Ò 16 sin 2 Ò sin 2 Ò 4 sin 2 Ò 16 Now ( 0 1 Ò 1) represents 0 Ò Ò 1Ü Ò 1 Ò Ò 1. Note that Ò = Ò +Ò 4. Note that Ò( Ò 4 Ò ) is (1 + tan ) or (cot + ).

24 Cost 2Ò: C[Ü] (Ü Ò 1), basis Ü Ò, C[Ü] (Ü Ò 2 1), basis Ü Ò, C[Ü] (Ü Ò 2 + 1), basis Ü Ò. Cost Ò: C[Ü] (Ü Ò 2 1), basis Ü Ò, C[Ü] (Ü Ò 4 1), basis Ü Ò, C[Ü] (Ü Ò 4 + 1), basis Ü Ò. Cost Ò: C[Ü] (Ü Ò 2 +1), basis Ü Ò, C[Ü] (Ü Ò 4 ), basis Ü Ò, C[Ü] (Ü Ò 4 + ), basis Ü Ò.

25 Cost Ò 2 2: C[Ü] (Ü Ò 4 1), basis Ü Ò, C[Ü] (Ü Ò 4 1), basis Ü Ò 4. Cost Ò 2 2: C[Ü] (Ü Ò 4 +1), basis Ü Ò, C[Ü] (Ü Ò 4 + 1), basis Ü Ò 2. Cost Ò 2: C[Ü] (Ü Ò 4 + 1), basis Ü Ò 2, C[Ü] (Ü Ò 8 ), basis Ü Ò 2, C[Ü] (Ü Ò 8 + ), basis Ü Ò 2.

26 Cost 4(Ò 4) 6: C[Ü] (Ü Ò 4 ), basis Ü Ò, C[Ü] (Ý Ò 4 1), basis Ý Ò 4. Cost 4(Ò 4) 6: C[Ü] (Ü Ò 4 + ), basis Ü Ò, C[Ü] (Ý Ò 4 1), basis Ý Ò 4. Cost 4(Ò 8) 6: C[Ü] (Ü Ò 8 ), basis Ü Ò 2, C[Ü] (Ý Ò 8 1), basis Ý Ò 8. Cost 4(Ò 8) 6: C[Ü] (Ü Ò 8 + ), basis Ü Ò 2, C[Ü] (Ý Ò 8 1), basis Ý Ò 8.

27 Overall cost 8 5Ò 28 to split into , entropy 9 4. Recurse: (34 9)Ò lg Ò + Ç(Ò). What if input is in C[Ü] (Ü Ò 1) with usual basis 1 Ü Ü Ò 1? Could scale immediately, but faster to scale upon twist. Cost (34 9)Ò lg Ò (124 27)Ò 2 lg Ò (2 9)( 1) lg Ò lg Ò + (16 27)( 1) lg Ò + 8, exactly as in 2004 Van Buskirk.

28 Easily handle R[Ü] (Ü Ò + 1) by mapping to C[Ü] (Ü Ò 2 ). Easily handle R[Ü] (Ü Ò 1) by mapping to R[Ü] (Ü Ò 2 1) R[Ü] (Ü Ò 2 +1). Cost (17 9)Ò lg Ò + Ç(Ò) for R[Ü] (Ü Ò 1) R 2 C Ò 2 1, so cost (17 3)Ò lg Ò + Ç(Ò) to compute ¾ R[Ü] (Ü Ò 1) given ¾ R[Ü] (Ü Ò 1). Cost (17 3)Ò lg Ò + Ç(Ò) for size-ò convolution. Open: Can 17 3 be improved?

The tangent FFT. Daniel J. Bernstein

The tangent FFT. Daniel J. Bernstein The tangent FFT Daniel J. Bernstein Department of Mathematics, Statistics, and Computer Science (M/C 249) University of Illinois at Chicago, Chicago, IL 60607 7045, USA djb@cr.yp.to Abstract. The split-radix