DIVIDE AND CONQUER II

Size: px

Start display at page:

Download "DIVIDE AND CONQUER II"

Antony Cross
6 years ago
Views:

DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and

1 DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT Lecture slides by Kevin Wayne Copyright 005 Pearson-Addison Wesley SECTIONS Last updated on 11/3/17 5:53 AM Master method Goal. Recipe for solving common divide-and-conquer recurrences: with T0) = 0 and T1) = Θ1). T n) =at n + fn) b Geometric series Fact 1. For r 1, Fact. For r = 1, 1+r + r + r r k 1 = 1 rk 1 r 1+r + r + r r k 1 = k Fact 3. For r < 1, 1+r + r + r = 1 1 r Terms. a 1 is the integer) number of subproblems. b is the integer) factor by which the subproblem size decreases. f n) = work to divide and combine subproblems. 1/ Recursion tree. k = logb n levels. a i = number of subproblems at level i. n / b i = size of subproblem at level i. 1 1/4 1/16 1/8 1/ / +1/4 + 1/8 + = 3 4

2 Case 1: total cost dominated by cost of leaves Case : total cost evenly distributed among levels Ex 1. If T n) satisfies T n) = 3 T n / ) + n, with T 1) = 1, then T n) = n log 3 ). Ex. If T n) satisfies T n) = T n / ) + n, with T 1) = 1, then T n) = Θn log n). T n) n T n) n T n / ) T n / ) n / ) T n / ) T n / ) T n / ) 3 n / ) T n / 4) T n / 4) T n / 4) T n / 4) n / ) T n / 4) T n / 4) T n / 4) T n / 4) T n / 4) T n / 4) T n / 4) T n / 4) T n / 4) 3 n / ) log n log n 3 i n / i ) T n / 8) T n / 8) T n / 8) T n / 8) T n / 8) T n / 8) T n / 8) T n / 8) 3 n / 3 ) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1)... T 1) T 1) T 1) r = 3 / > 1 3 log n = n log 3 T n) =1+r + r + r r log n ) n = r 1+log n 1 r 1 n = 3n log 3 3 log n n/ log n ) n 5 T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1)... T 1) T 1) T 1) n 1) r = 1 log n = n T n) =1+r + r + r r log n ) n = n log n + 1) 6 Case 3: total cost dominated by cost of root Master theorem Ex 3. If T n) satisfies T n) = 3 T n / 4) + n 5, with T 1) = 1, then T n) = Θn 5 ). T n) T n / 4) T n / 4) T n / 4) T n / 16) T n / 16) T n / 16) T n / 16) T n / 16) T n / 16) T n / 16) T n / 16) T n / 16) log 4 n n 5 3 n / 4) 5 3 n / 4 ) 5 3 i n / 4 i ) 5 T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1) T 1)... log T 1) T 1) T 1) 3 n 4 n/4 log 4 n ) 5 3 log 4 n = n log 4 3 Master theorem. Suppose that T n) is a function on the non-negative integers that satisfies the recurrence T n) =at with T0) = 0 and T1) = Θ1), where n / b means either n / b or n / b. Then, Case 1. If f n) = On k ) for some constant k < log b a, then T n) = Θn log b a). Ex. T n) = 3 Tn / ) + 5 n. a = 3, b =, f n) = 5 n, k = 1, logb a = 1.58 T n) = n log 3 ). n b + fn) r = 3 / 4 5 < 1 n 5 T n) 1 + r + r + r 3 + ) n r n 5 7 8

3 Master theorem Master theorem Master theorem. Suppose that T n) is a function on the non-negative integers that satisfies the recurrence T n) =at with T0) = 0 and T1) = Θ1), where n / b means either n / b or n / b. Then, Case. If f n) = Θn k log p n) for p 0 and k = log b a, then T n) = Θn k log p+1 n). Ex. T n) = Tn / ) + 17 n log n. a =, b =, f n) = 17 n log n, k = 1, p = 1, log b a = 1. T n) = Θn log n). n b + fn) Master theorem. Suppose that T n) is a function on the non-negative integers that satisfies the recurrence T n) =at with T0) = 0 and T1) = Θ1), where n / b means either n / b or n / b. Then, Case 3. If f n) = Ωn k ) for some constant k > log b a, and if a f n / b) c f n) for some constant c < 1 and all sufficiently large n, then T n) = Θ f n)). Ex. T n) = 3 Tn / ) + n. a = 3, b =, f n) = n, k =, log b a = 1.58 Regularity condition: 3 n / ) c n for c = 3/4. T n) = Θn ). n b + fn) regularity condition holds if fn) = Θn k ) 9 10 Master theorem Master theorem need not apply Master theorem. Suppose that T n) is a function on the non-negative integers that satisfies the recurrence T n) =at n b + fn) Gaps in master theorem. Number of subproblems must be a constant. T n) = ntn/) + n with T0) = 0 and T1) = Θ1), where n / b means either n / b or n / b. Then, Case 1. If f n) = On k ) for some constant k < log b a, then T n) = Θn log b a). Case. If f n) = Θn k log p n) for p 0 and k = log b a, then T n) = Θn k log p+1 n). Number of subproblems must be 1. T n) = 1 T n/) + n Non-polynomial separation between fn) and n log b a. T n) = T n/) + n log n Case 3. If f n) = Ωn k ) for some constant k > logb a, and if a f n / b) c f n) for some constant c < 1 and all sufficiently large n, then T n) = Θ f n)). fn) is not positive. T n) = T n/) n Pf sketch. Use recursion tree to sum up terms assuming n is an exact power of b). Three cases for geometric series. Deal with floors and ceilings. 11 Regularity condition does not hold. T n) = T n/) + n cos n) 1

4 Akra Bazzi theorem Desiderata. Generalizes master theorem to divide-and-conquer algorithms where subproblems have substantially different sizes. Theorem. [Akra Bazzi] Given constants a i > 0 and 0 < b i 1, functions h i n) = On / log n) and gn) = On c ), if the function Tn) satisfies the recurrence: k T n) = a i T b i n + h i n)) + gn) n i=1 Then Tn) = n p gu) 1+ du where p satisfies a i b p i =1. up+1 1 ai subproblems of size bi n small perturbation to handle floors and ceilings Ex. Tn) = 7/4 T n / ) + T 3/4 n ) + n, with T0) = 0 and T1) = 1. a 1 = 7/4, b 1 = 1/, a = 1, b = 3/4 p =. h 1n) = 1/ n 1/ n, h n) = 3/4 n 3/4 n. gn) = n Tn) = Θn log n). k i=1 13 DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT Integer addition Integer multiplication Addition. Given two n-bit integers a and b, compute a + b. Subtraction. Given two n-bit integers a and b, compute a b. Multiplication. Given two n-bit integers a and b, compute a b. Grade-school algorithm. Θn ) bit operations. Grade-school algorithm. Θn) bit operations Remark. Grade-school addition and subtraction algorithms are asymptotically optimal. 15 Conjecture. [Kolmogorov 195] Grade-school algorithm is optimal. Theorem. [Karatsuba 1960] Conjecture is wrong. 16

Divide-and-conquer multiplication To multiply two n-bit integers x and y: Divide x and y into low- and high-order bits. Multiply four ½n-bit integers, recursively. Add and shift to obtain result.

5 Divide-and-conquer multiplication To multiply two n-bit integers x and y: Divide x and y into low- and high-order bits. Multiply four ½n-bit integers, recursively. Add and shift to obtain result. m = n / a = x / m b = x mod m c = y / m d = y mod m m a + b) m c + d) = m ac + m bc + ad) + bd Ex. x = y = a b c d 1 use bit shifting to compute 4 terms 3 4 Divide-and-conquer multiplication MULTIPLY x, y, n) IF n = 1) RETURN x y. ELSE m n /. a x / m ; b x mod m. c y / m ; d y mod m. e MULTIPLY a, c, m). f MULTIPLY b, d, m). g MULTIPLY b, c, m). h MULTIPLY a, d, m). RETURN m e + m g + h) + f Divide-and-conquer multiplication analysis Karatsuba trick Proposition. The divide-and-conquer multiplication algorithm requires Θn ) bit operations to multiply two n-bit integers. Pf. Apply case 1 of the master theorem to the recurrence: ) Tn) = 4T n/! " +! Θn) " Tn) = Θn ) recursive calls add, shift To compute middle term bc + ad, use identity: bc + ad = ac + bd a b) c d) m = n / a = x / m c = y / m b = x mod m d = y mod m middle term m a + b) m c + d) = m ac + m bc + ad ) + bd = m ac + m ac + bd a b)c d)) + bd Bottom line. Only three multiplications of n / -bit integers. 19 0

Apply Case 1 of the master theorem to the recurrence: T n) = 3T n/) + n) = T n) = n log 3 ) = On 1.585 ) Practice. Faster than grade-school algorithm for about 30 640 bits.

6 Karatsuba multiplication Karatsuba analysis KARATSUBA-MULTIPLY x, y, n) IF n = 1) RETURN x y. ELSE m n /. a x / m ; b x mod m. c y / m ; d y mod m. e KARATSUBA-MULTIPLY a, c, m). f KARATSUBA-MULTIPLY b, d, m). g KARATSUBA-MULTIPLY a b, c d, m). RETURN m e + m e + f g) + f. Proposition. Karatsuba s algorithm requires On ) bit operations to multiply two n-bit integers. Pf. Apply Case 1 of the master theorem to the recurrence: T n) = 3T n/) + n) = T n) = n log 3 ) = On ) Practice. Faster than grade-school algorithm for about bits. 1 Integer arithmetic reductions History of asymptotic complexity of integer multiplication Integer multiplication. Given two n-bit integers, compute their product. year algorithm order of growth? brute force Θn ) problem arithmetic running time integer multiplication a b Mn) integer division a / b, a mod b ΘMn)) integer square a ΘMn)) integer square root a ΘMn)) integer arithmetic problems with the same complexity Mn) as integer multiplication 196 Karatsuba Ofman Θn ) 1963 Toom-3, Toom-4 Θn ), Θn ) 1966 Toom Cook Θn 1 + ε ) 1971 Schönhage Strassen Θn log n log log n) 007 Fürer n log n Olog*n)?? Θn) number of bit operations to multiply two n-bit integers used in Maple, Mathematica, gcc, cryptography,... Remark. GNU Multiple Precision Library uses one of five different algorithms depending on size of operands. 3 4

7 Dot product SECTION 4. DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT Dot product. Given two length-n vectors a and b, compute c = a b. Grade-school. Θn) arithmetic operations. [ ] [ ] a = b = a b =.70.30) ) ) =.3 Remark. Grade-school dot product algorithm is asymptotically optimal. n a b = a i b i i=1 6 Matrix multiplication Block matrix multiplication Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. Grade-school. Θn 3 ) arithmetic operations. " c 11 c 1! c 1n c 1 c! c n = " " " c n1 c n! c nn " = " a 11 a 1! a 1n a 1 a! a n " " " a n1 a n! a nn " " " b 11 b 1! b 1n b 1 b! b n " " " b n1 b n! b nn c ij = Q. Is grade-school matrix multiplication algorithm asymptotically optimal? n k=1 a ik b kj A C 11 A 1 B " " " = C 11 = A 11 B 11 + A 1 B 1 = B =

8 Block matrix multiplication: warmup Strassen s trick To multiply two n-by-n matrices A and B: Divide: partition A and B into ½n-by-½n blocks. Conquer: multiply 8 pairs of ½n-by-½n matrices, recursively. Combine: add appropriate products using 4 matrix additions. C = A n-by-n matrices " C 11 C 1 " = A 11 A 1 " B 11 B 1 C 1 C A 1 A B 1 B B ½n-by-½n matrices Running time. Apply case 1 of the master theorem. ) + A 1 B 1 ) ) + A 1 B ) ) + A B 1 ) ) + A B ) C 11 = A 11 B 11 C 1 = A 11 B 1 C 1 = A 1 B 11 C = A 1 B 1 8 matrix multiplications of ½n-by-½n matrices) 4 matrix additions of ½n-by-½n matrices) Key idea. Can multiply two -by- matrices via 7 scalar multiplications plus 11 additions and 7 subtractions). " C 11 C 1 " = A 11 A 1 " B 11 B 1 P 1 A 11 B 1 B ) C 1 C A 1 A B 1 B P A 11 + A 1) B P 3 A 1 + A ) B 11 P 4 A B 1 B 11) C 11 = P 5 + P 4 P + P 6 C P 5 A 11 + A ) B 11 1 = P 1 + P + B ) C 1 = P 3 + P 4 P 6 A 1 A ) B 1 + B ) C = P 1 + P 5 P 3 P 7 P 7 A 11 A 1) B 11 + B 1) Pf. C 1 = P 1 + P scalars 7 scalar multiplications ) Tn) = 8T n/! " + Θn )!" recursive calls add, form submatrices Tn) = Θn 3 ) 9 = A 11 B 1 B ) + A 11 + A 1) B = A 11 B 1 + A 1 B. 30 Strassen s trick Strassen s algorithm n-by-n ½n-by-½n matrix Key idea. Can multiply two -by- matrices via 7 scalar multiplications plus 11 additions and 7 subtractions). " C 11 C 1 " A = 11 A 1 " B 11 B 1 C 1 C A 1 A B 1 B Pf. C1 = P1 + P C 11 = P 5 + P 4 P + P 6 C1 = P1 + P C1 = P3 + P4 ½n-by-½n matrices C = P1 + P5 P3 P7 = A11 B1 B) + A11 + A1) B = A11 B1 + A1 B. P 1 A 11 B 1 B ) P A 11 + A 1) B P 3 A 1 + A ) B 11 P 4 A B 1 B 11) P5 A11 + A) B11 + B) P6 A1 A) B1 + B) P7 A11 A1) B11 + B1) 7 matrix multiplications of ½n-by-½n matrices) 31 assume n is a power of STRASSEN n, A, B) IF n = 1) RETURN A B. Partition A and B into ½n-by-½n blocks. P 1 STRASSEN n /, A 11, B 1 B )). P STRASSEN n /, A 11 + A 1), B ). P 3 STRASSEN n /, A 1 + A ), B 11). P 4 STRASSEN n /, A, B 1 B 11)). P 5 STRASSEN n /, A 11 + A ), B 11 + B )). P6 STRASSEN n /, A1 A), B1 + B)). P7 STRASSEN n /, A11 A1), B11 + B1)). C11 = P5 + P4 P + P6. C1 = P1 + P. C1 = P3 + P4. C = P1 + P5 P3 P7. RETURN C. keep track of indices of submatrices don t copy matrix entries) " C 11 C 1 " A = 11 A 1 " B 11 B 1 C 1 C A 1 A B 1 B 3

9 Analysis of Strassen s algorithm Theorem. Strassen s algorithm requires On.81 ) arithmetic operations to multiply two n-by-n matrices. Pf. Apply case 1 of the master theorem to the recurrence: ) Tn) = 7T n /! " + Θn ) Tn) = Θn!" log 7 ) = On.81 ) recursive calls add, subtract Q. What if n is not a power of? A. Could pad matrices with zeros. Strassen s algorithm: practice Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm when n is small. Common misperception. Strassen s algorithm is only a theoretical curiosity. Apple reports 8x speedup on G4 Velocity Engine when n,048. Range of instances where it s useful is a subject of controversy = Linear algebra reductions Fast matrix multiplication: theory Matrix multiplication. Given two n-by-n matrices, compute their product. problem linear algebra order of growth matrix multiplication A B MMn) matrix inversion A 1 ΘMMn)) determinant A ΘMMn)) system of linear equations Ax = b ΘMMn)) LU decomposition A = L U ΘMMn)) least squares min Ax b ΘMMn)) numerical linear algebra problems with the same complexity MMn) as matrix multiplication Q. Multiply two -by- matrices with 7 scalar multiplications? A. Yes! [Strassen 1969] Q. Multiply two -by- matrices with 6 scalar multiplications? A. Impossible. [Hopcroft Kerr 1971] Q. Multiply two 3-by-3 matrices with 1 scalar multiplications? A. Unknown. Begun, the decimal wars have. [Pan, Bini et al., Schönhage, ] Two 0-by-0 matrices with 4,460 scalar multiplications. Two 48-by-48 matrices with 47,17 scalar multiplications. A year later. December January Θn log 7 ) = On.807 ) Θn log 6 ) = On.59 ) Θn log 3 1 ) = On.77 ) On.805 ) On.7801 ) On.7799 ) On ) On ) 35 36

10 History of asymptotic complexity of matrix multiplication year algorithm order of growth? brute force O n 3 ) 1969 Strassen O n.808 ) 1978 Pan O n.796 ) 1979 Bini O n.780 ) 1981 Schönhage O n.5 ) 198 Romani O n.517 ) DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT 198 Coppersmith Winograd O n.496 ) 1986 Strassen O n.479 ) 1989 Coppersmith Winograd O n.376 ) 010 Strother O n.3737 ) 011 Williams O n.377 )?? O n + ε ) number of floating-point operations to multiply two n-by-n matrices 37 Fourier analysis Euler s identity Fourier theorem. [Fourier, Dirichlet, Riemann] Any sufficiently smooth) periodic function can be expressed as the sum of a series of sinusoids. Euler s identity. e ix = cos x + i sin x. Sinusoids. Sum of sine and cosines = sum of complex exponentials. t yt) = N sin kt N = π k k=

Time domain vs. frequency domain Time domain vs. frequency domain Signal. [touch tone button 1] Time domain. Frequency domain. sound pressure yt) = 1 sinπ 697 t) + 1 sinπ 109 t) Signal.

5 amplitude Reference: Cleve Moler, Numerical Computing with MATLAB Reference: Cleve Moler, Numerical Computing with MATLAB 41 4 Fast Fourier transform Fast Fourier transform: applications FFT.

11 Time domain vs. frequency domain Time domain vs. frequency domain Signal. [touch tone button 1] Time domain. Frequency domain. sound pressure yt) = 1 sinπ 697 t) + 1 sinπ 109 t) Signal. [recording, 819 samples per second] Magnitude of discrete Fourier transform. 0.5 amplitude Reference: Cleve Moler, Numerical Computing with MATLAB Reference: Cleve Moler, Numerical Computing with MATLAB 41 4 Fast Fourier transform Fast Fourier transform: applications FFT. Fast way to convert between time-domain and frequency-domain. Alternate viewpoint. Fast way to multiply and evaluate polynomials. If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. Numerical Recipes we take this approach Applications. Optics, acoustics, quantum physics, telecommunications, radar, control systems, signal processing, speech recognition, data compression, image processing, seismology, mass spectrometry, Digital media. [DVD, JPEG, MP3, H.64] Medical diagnostics. [MRI, CT, PET scans, ultrasound] Numerical solutions to Poisson s equation. Integer and polynomial multiplication. Shor s quantum factoring algorithm. The FFT is one of the truly great computational developments of [the 0th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. Charles van Loan 43 44

12 Fast Fourier transform: brief history Polynomials: coefficient representation Gauss 1805, 1866). Analyzed periodic motion of asteroid Ceres. Runge-König 194). Laid theoretical groundwork. Danielson-Lanczos 194). Efficient algorithm, x-ray crystallography. Cooley Tukey 1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT. Polynomial. [coefficient representation] Ax) = a 0 + a 1 x + a x +!+ a x Bx) = b 0 + b 1 x + b x +!+ b x Add. On) arithmetic operations. Ax)+ Bx) = a 0 +b 0 )+a 1 +b 1 )x +!+ a +b )x Evaluate. On) using Horner s method. paper published only after IBM lawyers decided not to set precedent of patenting numerical algorithms conventional wisdom: nobody could make money selling software!) Importance not fully realized until advent of digital computers. 45 Ax) = a 0 +xa 1 + xa +!+ xa n + xa ))!)) Multiply convolve). On ) using brute force. Ax) Bx) = n c i x i, where c i = a j b i j i =0 j =0 i 46 A modest Ph.D. dissertation title Polynomials: point-value representation DEMONSTRATIO NOVA THEOREMATIS OMNEM FVNCTIONEM ALGEBRAICAM RATIONALEM INTEGRAM VNIVS VARIABILIS IN FACTORES REALES PRIMI VEL SECUNDI GRADVS RESOLVI POSSE AVCTORE CAROLO FRIDERICO GAVSS HELMSTADII APVD C. G. FLECKEISEN Fundamental theorem of algebra. A degree n polynomial with complex coefficients has exactly n complex roots. Corollary. A degree n 1 polynomial Ax) is uniquely specified by its evaluation at n distinct values of x 1. y Quaelibet aequatio algebraica determinata reduci potest ad formam x m +Ax m-1 +Bx m- + etc. +M=0, ita vt m sit numerus integer positiuus. Si partem primam huius aequationis per X denotamus, aequationique X=0 per plures valores inaequales ipsius x satisfieri supponimus, puta ponendo x=α, x=β, x=γ etc. functio X per productum e factoribus x-α, x-β, x-γ etc. diuisibilis erit. Vice versa, si productum e pluribus factoribus simplicibus x- α, x-β, x-γ etc. functionem X metitur: aequationi X=0 satisfiet, aequando ipsam x cuicunque quantitatum α, β, γ etc. Denique si X producto ex m factoribus talibus simplicibus aequalis est siue omnes diuersi sint, siue quidam ex ipsis identici): alii factores simplices praeter hos functionem X metiri non poterunt. y j = Ax j ) Quamobrem aequatio m ti gradus plures quam m radices habere nequit; simul vero patet, aequationem m ti gradus pauciores radices habere posse, etsi X in m factores simplices resolubilis sit: New proof of the theorem that every algebraic rational integral function in one variable can be resolved into real factors of the first or the second degree. 47 x j x 48

13 Polynomials: point-value representation Converting between two representations Polynomial. [point-value representation] Ax) : x 0, y 0 ),, x n-1, y ) Bx) : x 0, z 0 ),, x n-1, z ) Add. On) arithmetic operations. Ax)+ Bx) : x 0, y 0 + z 0 ),, x n-1, y + z ) Multiply convolve). On), but need n 1 points. Ax) Bx) : x 0, y 0 z 0 ),, x n-1, y z ) Evaluate. On ) using Lagrange s formula. x x j ) j k Ax) = y k k=0 x k x j ) j k 49 Tradeoff. Fast evaluation or fast multiplication. We want both! representation multiply evaluate coefficient On ) On) point-value On) On ) Goal. Efficient conversion between two representations all ops fast. a 0, a 1,..., a x n-1 0, y 0 ),, x, y ) coefficient representation point-value representation 50 Converting between two representations: brute force Converting between two representations: brute force Coefficient point-value. Given a polynomial a 0 + a 1 x a n 1 x n 1, evaluate it at n distinct points x 0,..., x n 1. y 0 y 1 y! y = 1 x 0 x 0 " x 0 1 x 1 x 1 " x 1 1 x x " x!!!! 1 x x " x Running time. On ) for matrix-vector multiply or n Horner s). a 0 a 1 a! a Point-value coefficient. Given n distinct points x 0,..., x n 1 and values y 0,..., y n 1, find unique polynomial a 0 + a 1 x a n 1 x n 1, that has given values at given points. y 0 y 1 y! y = 1 x 0 x 0 " x 0 1 x 1 x 1 " x 1 1 x x " x!!!! 1 x x " x Running time. On 3 ) for Gaussian elimination. a 0 a 1 a! a Vandermonde matrix is invertible iff xi distinct or On.377 ) via fast matrix multiplication 51 5

14 Divide-and-conquer Decimation in frequency. Break up polynomial into low and high powers. Ax) = a 0 + a 1 x + a x + a 3 x 3 + a 4 x 4 + a 5 x 5 + a 6 x 6 + a 7 x 7. A low x) = a 0 + a 1 x + a x + a 3 x 3. A high x) = a 4 + a 5 x + a 6 x + a 7 x 3. Ax) = A low x) + x 4 A high x). Decimation in time. Break up polynomial into even and odd powers. Ax) = a 0 + a 1 x + a x + a 3 x 3 + a 4 x 4 + a 5 x 5 + a 6 x 6 + a 7 x 7. A even x) = a 0 + a x + a 4 x + a 6 x 3. A odd x) = a 1 + a 3 x + a 5 x + a 7 x 3. Ax) = A evenx ) + x A odd x ). Coefficient to point-value representation: intuition Coefficient point-value. Given a polynomial a 0 + a 1 x a n 1 x n 1, evaluate it at n distinct points x 0,..., x n 1. Divide. Break up polynomial into even and odd powers. Ax) = a 0 + a 1 x + a x + a 3 x 3 + a 4 x 4 + a 5 x 5 + a 6 x 6 + a 7 x 7. A even x) = a 0 + a x + a 4 x + a 6 x 3. A odd x) = a 1 + a 3 x + a 5 x + a 7 x 3. Ax) = A evenx ) + x A odd x ). A-x) = A even x ) x A odd x ). Intuition. Choose two points to be ±1. A 1) = A even 1) + 1 A odd 1). A-1) = A even1) 1 A odd 1). we get to choose which ones! Can evaluate polynomial of degree n at points by evaluating two polynomials of degree ½n at 1 point Coefficient to point-value representation: intuition Discrete Fourier transform Coefficient point-value. Given a polynomial a 0 + a 1 x a n 1 x n 1, evaluate it at n distinct points x 0,..., x n 1. Divide. Break up polynomial into even and odd powers. Ax) = a 0 + a 1 x + a x + a 3 x 3 + a 4 x 4 + a 5 x 5 + a 6 x 6 + a 7 x 7. A evenx) = a 0 + a x + a 4 x + a 6 x 3. A odd x) = a 1 + a 3 x + a 5 x + a 7 x 3. Ax) = A evenx ) + x A odd x ). A-x) = A even x ) x A odd x ). Intuition. Choose four complex points to be ±1, ±i. A 1) = A even 1) + 1 A odd 1). A-1) = A even 1) 1 A odd 1). A i ) = A even -1) + i A odd -1). A -i ) = A even -1) i A odd -1). we get to choose which ones! Can evaluate polynomial of degree n at 4 points by evaluating two polynomials of degree ½n at point. Coefficient point-value. Given a polynomial a 0 + a 1 x a n 1 x n 1, evaluate it at n distinct points x 0,..., x n 1. Key idea. Choose x k = ω k where ω is principal n th root of unity. y 0 y 1 y y 3! y = DFT " 1 1 ω 1 ω ω 3 " ω 1 ω ω 4 ω 6 " ω ) 1 ω 3 ω 6 ω 9 " ω 3)!!!!! 1 ω ω ) ω 3) " ω )) Fourier matrix Fn a 0 a 1 a a 3! a 55 56

15 Roots of unity Fast Fourier transform Def. An n th root of unity is a complex number x such that x n = 1. Fact. The n th roots of unity are: ω 0, ω 1,, ω n 1 where ω = e π i / n. Pf. ω k ) n = e π i k / n ) n = e π i ) k = -1) k = 1. Fact. The ½n th roots of unity are: ν 0, ν 1,, ν n/ 1 where ν = ω = e 4π i / n. ω = ν 1 = i ω 3 ω 1 ω 4 = ν = -1 n = 8 ω 0 = ν 0 = 1 ω 5 ω 7 Goal. Evaluate a degree n 1 polynomial Ax) = a a n 1 x n 1 at its n th roots of unity: ω 0, ω 1,, ω n 1. Divide. Break up polynomial into even and odd powers. A even x) = a 0 + a x + a 4 x + + a n- x n/ 1. A odd x) = a 1 + a 3 x + a 5 x + + a n-1 x n/ 1. Ax) = A even x ) + x A odd x ). Conquer. Evaluate A even x) and A odd x) at the ½n th roots of unity: ν 0, ν 1,, ν n/ 1. ν k = ω k ) Combine. Aω k ) = A even ν k ) + ω k A odd ν k ), 0 k < n/ Aω k+ ½n ) = A even ν k ) ω k A odd ν k ), 0 k < n/ ω 6 = ν 3 = -i ν k = ω k + ½n ) ω k+ ½n = -ω k FFT: implementation FFT: summary Theorem. The FFT algorithm evaluates a degree n 1 polynomial at each of the n th roots of unity in On log n) steps and On) extra space. FFT n, a 0, a 1, a,, a n 1) Pf. Tn) = Tn /) + Θn) Tn) = Θn logn) assumes n is a power of IF n = 1) RETURN a 0. e 0, e 1,, e n/ 1) FFT n /, a 0, a, a 4,, a n ). d 0, d 1,, d n/ 1) FFT n /, a 1, a 3, a 5,, a n 1). FOR k = 0 TO n / 1. ω k e πik/n. yk ek + ω k dk. yk + n/ ek ω k dk. RETURN y0, y1, y,, yn 1). a 0, a 1,..., a n-1 coefficient representation On log n)??? x 0, y 0 ),, x, y ) point-value representation 59 60

16 FFT: recursion tree Inverse discrete Fourier transform a 0, a 1, a, a 3, a 4, a 5, a 6, a 7 Point-value coefficient. Given n distinct points x 0,..., x n 1 and values y 0,..., y n 1, find unique polynomial a 0 + a 1 x a n 1 x n 1, that has given values at given points. inverse perfect shuffle a 0, a, a 4, a 6 a 0, a 4 a, a 6 a 1, a 5 a 1, a 3, a 5, a 7 a 3, a 7 a 0 a 1 a a 3! a = " ω 1 ω ω 3 " ω 1 ω ω 4 ω 6 " ω ) 1 ω 3 ω 6 ω 9 " ω 3)!!!!! 1 ω ω ) ω 3) " ω )) y 0 y 1 y y 3! y a 0 a 4 a a 6 a 1 a 5 a 3 a Inverse DFT Fourier matrix inverse Fn) -1 bit-reversed order 61 6 Inverse discrete Fourier transform Inverse FFT: proof of correctness Claim. Inverse of Fourier matrix F n is given by following formula: G n ! 1 1 ω 1 ω ω 3! ω ) ) ) = 1 1 ω ω 4 ω 6! ω ) ) n 1 ω 3 ω 6 ω 9! ω 3) ) ) " " " " " ) 1 ω ) ω ) ω 3)! ω )) ) Fn / n is a unitary matrix Consequence. To compute inverse FFT, apply same algorithm but use ω 1 = e π i / n as principal n th root of unity and divide the result by n). Claim. F n and G n are inverses. Pf. F n G n ) k k " = 1 n Summation lemma. Let ω be a principal n th root of unity. Then ω k j j=0 ω k j ω j k " = 1 n j=0 = n if k 0 mod n 0 otherwise ω k k ") j j=0 = Pf. If k is a multiple of n then ω k = 1 series sums to n. Each n th root of unity ω k is a root of x n 1 = x 1) 1 + x + x x n-1 ). if ω k 1 we have: 1 + ω k + ω k) + + ω kn-1) = 0 series sums to 0. 1 if k = k " 0 otherwise summation lemma below) 63 64

17 Inverse FFT: implementation Inverse FFT: summary Note. Need to divide result by n. Theorem. The inverse FFT algorithm interpolates a degree n 1 polynomial given values at each of the n th roots of unity in On log n) steps. INVERSE-FFT n, y 0, y 1, y,, y n 1) assumes n is a power of IF n = 1) RETURN y 0. e 0, e 1,, e n/ 1) INVERSE-FFT n /, y 0, y, y 4,, y n ). d 0, d 1,, d n/ 1) INVERSE-FFT n /, y 1, y 3, y 5,, y n 1). FOR k = 0 TO n / 1. ω k e πik/n. switch roles of ai and yi a k e k + ω k d k ). a k + n/ e k ω k d k ). RETURN a 0, a 1, a,, a n 1). a 0, a 1,..., a n-1 coefficient representation On log n) FFT) On log n) inverse FFT) x 0, y 0 ),, x, y ) point-value representation Polynomial multiplication FFT in practice? Theorem. Can multiply two degree n 1 polynomials in On log n) steps. Pf. pad with 0s to make n a power of coefficient representation coefficient representation a 0, a 1,, a n-1 b 0, b 1,, b n-1 c 0, c 1,, c n- FFTs On log n) inverse FFT On log n) Aω 0 ),..., Aω ) Bω 0 ),..., Bω ) point-value multiplication On) Cω 0 ),..., Cω ) 67 68

FFT in practice Integer multiplication, redux Fastest Fourier transform in the West. [Frigo Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension.

18 FFT in practice Integer multiplication, redux Fastest Fourier transform in the West. [Frigo Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code. Implementation details. Core algorithm is nonrecursive version of Cooley Tukey. Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to shape of the problem. Runs in On log n) time, even when n is prime. Multidimensional FFTs Integer multiplication. Given two n-bit integers a = a n 1 a 1 a 0 and b = b n 1 b 1 b 0, compute their product a b. Convolution algorithm. Form two polynomials. Note: a = A), b = B). Compute Cx) = Ax) Bx). Evaluate C) = a b. Running time: On log n) complex arithmetic operations. Theory. [Schönhage Strassen 1971] On log n log log n) bit operations. Theory. [Fürer 007] n log n Olog* n) bit operations. Practice. [GNU Multiple Precision Arithmetic Library] Ax) = a 0 + a 1 x + a x +!+ a x Bx) = b 0 +b 1 x +b x +!+ b x It uses brute force, Karatsuba, and FFT, depending on the size of n. 70

DIVIDE AND CONQUER II

DIVIDE AND CONQUER II master theorem integer multiplication matrix multiplication convolution and FFT SECTION 5.6 Fourier analysis Fourier theorem. [Fourier, Dirichlet, Riemann] Any sufficiently smooth)