The divide-and-conquer strategy solves a problem by: 1. Breaking it into subproblems that are themselves smaller instances of the same type of problem

Similar documents
The divide-and-conquer strategy solves a problem by: Chapter 2. Divide-and-conquer algorithms. Multiplication

Divide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016

Divide-and-conquer algorithms

CS483 Design and Analysis of Algorithms

Introduction to Algorithms 6.046J/18.401J/SMA5503

CS483 Design and Analysis of Algorithms

CSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences

Divide and Conquer CPE 349. Theresa Migler-VonDollen

Divide and Conquer Algorithms

Introduction to Algorithms

Class Note #14. In this class, we studied an algorithm for integer multiplication, which. 2 ) to θ(n

CMPT 307 : Divide-and-Conqer (Study Guide) Should be read in conjunction with the text June 2, 2015

The Fast Fourier Transform

Divide and Conquer: Polynomial Multiplication Version of October 1 / 7, 24201

Kartsuba s Algorithm and Linear Time Selection

Chapter 5 Divide and Conquer

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

Divide-and-Conquer and Recurrence Relations: Notes on Chapter 2, Dasgupta et.al. Howard A. Blair

ELEMENTARY LINEAR ALGEBRA

Divide and conquer. Philip II of Macedon

Chapter 2. Recurrence Relations. Divide and Conquer. Divide and Conquer Strategy. Another Example: Merge Sort. Merge Sort Example. Merge Sort Example

Divide and Conquer algorithms

ELEMENTARY LINEAR ALGEBRA

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Introduction to Divide and Conquer

CS 470/570 Divide-and-Conquer. Format of Divide-and-Conquer algorithms: Master Recurrence Theorem (simpler version)

CMPS 2200 Fall Divide-and-Conquer. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

Fast Convolution; Strassen s Method

Chapter 1 Divide and Conquer Polynomial Multiplication Algorithm Theory WS 2015/16 Fabian Kuhn

Design and Analysis of Algorithms

Practical Session #3 - Recursions

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

shelat 16f-4800 sep Matrix Mult, Median, FFT

A design paradigm. Divide and conquer: (When) does decomposing a problem into smaller parts help? 09/09/ EECS 3101

The Fast Fourier Transform: A Brief Overview. with Applications. Petros Kondylis. Petros Kondylis. December 4, 2014

FFT: Fast Polynomial Multiplications

CSCI Honor seminar in algorithms Homework 2 Solution

ELEMENTARY LINEAR ALGEBRA

Divide & Conquer. Jordi Cortadella and Jordi Petit Department of Computer Science

Linear Algebra March 16, 2019

A matrix over a field F is a rectangular array of elements from F. The symbol

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Discrete Mathematics U. Waterloo ECE 103, Spring 2010 Ashwin Nayak May 17, 2010 Recursion

Divide-and-conquer: Order Statistics. Curs: Fall 2017

Divide and Conquer. Andreas Klappenecker. [based on slides by Prof. Welch]

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

ELEMENTARY LINEAR ALGEBRA

Chapter 1 Divide and Conquer Algorithm Theory WS 2016/17 Fabian Kuhn

CSE 421 Algorithms: Divide and Conquer

Divide and Conquer Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 14

CS361 Homework #3 Solutions

Elementary maths for GMT

feb abhi shelat Matrix, FFT

Divide and Conquer. Arash Rafiey. 27 October, 2016

MATRIX MULTIPLICATION AND INVERSION

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Speedy Maths. David McQuillan

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

Design and Analysis of Algorithms

Reductions, Recursion and Divide and Conquer

14.1 Finding frequent elements in stream

Algorithms (II) Yu Yu. Shanghai Jiaotong University

The Fast Fourier Transform. Andreas Klappenecker

Divide and Conquer. CSE21 Winter 2017, Day 9 (B00), Day 6 (A00) January 30,

Scientific Computing: An Introductory Survey

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

Multiplying huge integers using Fourier transforms

Matrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via.

CS 310 Advanced Data Structures and Algorithms

Review Of Topics. Review: Induction

Asymptotic Analysis and Recurrences

ICS 6N Computational Linear Algebra Matrix Algebra

Design and Analysis of Algorithms

4.4 Recurrences and Big-O

1 Divide and Conquer (September 3)

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

CS173 Running Time and Big-O. Tandy Warnow

Advanced Counting Techniques

Advanced Counting Techniques. Chapter 8

CPS 616 DIVIDE-AND-CONQUER 6-1

1.1 Administrative Stuff

1 Closest Pair of Points on the Plane

Lecture 8 - Algebraic Methods for Matching 1

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: October 11. In-Class Midterm. ( 7:05 PM 8:20 PM : 75 Minutes )

CS/COE 1501 cs.pitt.edu/~bill/1501/ Integer Multiplication

Chapter 2. Divide-and-conquer. 2.1 Strassen s algorithm

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Divide and Conquer Strategy

Copyright 2000, Kevin Wayne 1

Fast Fourier Transform

The maximum-subarray problem. Given an array of integers, find a contiguous subarray with the maximum sum. Very naïve algorithm:

Chapter 4. Matrices and Matrix Rings

Divide and Conquer. Andreas Klappenecker

Chapter 5. Divide and Conquer. CS 350 Winter 2018

Fast Polynomial Multiplication

University of the Virgin Islands, St. Thomas January 14, 2015 Algorithms and Programming for High Schoolers. Lecture 5

Week 15-16: Combinatorial Design

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations

Answer the following questions: Q1: ( 15 points) : A) Choose the correct answer of the following questions: نموذج اإلجابة

Data Structures and Algorithms

Transcription:

Chapter 2. Divide-and-conquer algorithms

The divide-and-conquer strategy solves a problem by: 1. Breaking it into subproblems that are themselves smaller instances of the same type of problem. 2. Recursively solving these subproblems. 3. Appropriately combining their answers.

Multiplication

Johann Carl Friedrich Gauss 1777-1855 1+2+ +100= 100 (1 + 100) 2 =5050.

Gauss once noticed that although the product of two complex numbers (a + bi)(c + di) =ac bd +(bc + ad)i seems to involve four real-number multiplications, it can in fact be done with just three: ac, bd, and(a + b)(c + d), since bc + ad =(a + b)(c + d) ac bd. In our big-o way of thinking, reducing the number of multiplications from four to three seems wasted ingenuity. But this modest improvement becomes very significant when applied recursively.

Suppose x and y are two n-bit integers, and assume for convenience that n is a power of 2. Lemma For every n there exists an n 0 with n apple n 0 apple 2n such that n 0 apowerof2. As a first step toward multiplying x and y, wespliteachofthemintotheirleft and right halves, which are n/2 bits long: x = x L x R =2 n/2 x L + x R y = y L y R =2 n/2 y L + y R. xy =(2 n/2 x L + x R )(2 n/2 y L + y R )=2 n x L y L +2 n/2 (x L y R + x R y L )+x R y R. The additions take linear time, as do the multiplications by powers of 2. The significant operations are the four n/2-bit multiplications; these we can handle by four recursive calls.

Our method for multiplying n-bit numbers starts by making recursive calls to multiply these four pairs of n/2-bit numbers, and then evaluates the preceding expression in O(n) time. Writing T (n) for the overall running time on n-bit inputs, we get the recurrence relation: T (n) =4T (n/2) + O(n) Solution: O(n 2 ). By Gauss s trick, three multiplications, x L y L, x R y R,and(x L + x R )(y L + y R ), su ce, as x L y R + x R y L =(x L + x R )(y L + y R ) x L y L x R y R.

A divide-and-conquer algorithm for integer multiplication multiply(x, y) // Input: positive integers x and y, in binary // Output: their product 1. n =max(sizeofx, sizeofy) roundedasapowerof2. 2. if n =1then return xy. 3. x L, x R = leftmost n/2, rightmost n/2 bitsofx 4. y L, y R = leftmost n/2, rightmost n/2 bitsofy 5. P 1 = multiply(x L, y L ) 6. P 2 = multiply(x R, y R ) 7. P 3 = multiply(x L + x R, y L + y R ) 8. return P 1 2 n +(P 3 P 1 P 2) 2 n/2 + P 2.

The time analysis The recurrence relation: T (n) =3T (n/2) + O(n) I I I The algorithm s recursive calls form a tree structure. At each successive level of recursion the subproblems get halved in size. At the (log 2 n) th level, the subproblems get down to size 1, and so the recursion ends. I The height of the tree is log 2 n. I The branching factor is 3: each problem recursively produces three smaller ones, with the result that at depth k in the tree there are 3 k subproblems, eachofsize n/2 k. For each subproblem, a linear amount of work is done in identifying further subproblems and combining their answers. Therefore the total time spent at depth k in the tree is 3 k n O = 2 k k 3 O(n). 2

The time analysis (cont d) 3 k n O = 2 k k 3 O(n). 2 At the very top level, when k =0,weneedO(n). At the bottom, when k =log 2 n,itis O 3 log n 2 = O n log 3 2 Between these two endpoints, the work done increases geometrically from O(n) to O(n log 2 3 ), by a factor of 3/2 perlevel. The sum of any increasing geometric series is, within a constant factor, simply the last term of the series Therefore the overall running time is We can do even better! O(n log 2 3 ) O(n 1.59 ).

Recurrence relations

Master theorem If T (n) =at (dn/be)+o(n d ) for some constants a > 0, b > 1, and d 0, then 8 >< O(n d ) if d > log b a T (n) = O(n d log n) if d =log b a >: O(n log b a ) if d < log b a.

Proof of Master Theorem Assume that n is a power of b. This will not influence the final bound in any important way: n is at most a multiplicative factor of b away from some power of b. Next, notice that the size of the subproblems decreases by a factor of b with each level of recursion, and therefore reaches the base case after log b n levels. This is the height of the recursion tree. The branching factor of the recursion tree is a, sothekth level of the tree is made up of a k subproblems, eachofsize n = b k. The total work done at this level is a k O n b k d = O(n d ) a b d k.

Proof of Master Theorem The total work done is log n b X k=0 a k n d O = O(n d a k ). b k b d It s the sum of a geometric series with ratio a/b d. 1. The ratio is less than 1. Then the series is decreasing, and its sum is just given by its first term, O(n d ). 2. The ratio is greater than 1. The series is increasing and its sum is given by its last term, O(n log b a ): n d a logb n = n d a log b n = a log b n = a (log a n)(log b a) = n log b a. b d (b log b n ) d 3. The ratio is exactly 1. In this case all O(log n) termsoftheseriesare equal to O(n d ).

Merge sort

The algorithm mergesort(a[1...n]) // Input: an array of numbers a[1...n] // Output: A sorted version of this array 1. if n > 1 then 2. return merge(mergesort(a[1...bn/2c]), 3. mergesort(a[bn/2c +1...n]), 4. else return a. merge(x[1...k], y[1...`]) // Input: two sorted arrays x and y // Output: A sorted version of the union of x and y 1. if k =0then return y[1...`] 2. if ` =0then return x[1...k] 3. if x[1] apple y[1] 4. then return x[1] merge(x[2...k], y[1...`]) 5. else return y[1] merge(x[1...k], y[2...`]).

The time analysis The recurrence relation: T (n) =2T (n/2) + O(n); By Master Theorem T (n) =O(n log n).

An n log n lower bound for sorting Sorting algorithms can be depicted as trees. The depth of the tree the number of comparisons on the longest path from root to leaf, is exactly the worst-case time complexity of the algorithm. Consider any such tree that sorts an array of n elements. Each of its leaves is labeled by a permutation of {1, 2,...,n}. every permutation must appear as the label of a leaf. This is a binary tree with n! leaves. So, the depth of our tree and the complexity of our algorithm must be at least log(n!) log p (2n +1/3) nn e n = (n log n), where we use Stirling s formula.

Median

Median The median of a list of numbers is its 50th percentile: half the numbers are bigger than it, and half are smaller. If the list has even length, there are two choices for what the middle element could be, in which case we pick the smaller of the two, say. The purpose of the median is to summarize a set of numbers by a single, typical value. Computing the median of n numbers is easy: just sort them. The drawback is that this takes O(n log n) time,whereaswewouldideallylikesomethinglinear. We have reason to be hopeful, because sorting is doing far more work than we really need we just want the middle element and don t care about the relative ordering of the rest of them.

Selection Input: AlistofnumbersS; anintegerk. Output: Thekth smallest element of S.

A randomized divide-and-conquer algorithm for selection For any number v, imaginesplittinglists into three categories: I elements smaller than v, i.e.,s L ; I I those equal to v, i.e.,s v (there might be duplicates); and those greater than v, i.e.,s R respectively. 8 >< selection(s L, k) if k apple S L selection(s, k) = v if S L < k apple S L + S v >: selection(s R, k S L S v ) if k > S L + S v.

How to choose v? It should be picked quickly, and it should shrink the array substantially, the ideal situation being S L, S R S 2. If we could always guarantee this situation, we would get a running time of T (n) =T (n/2) + O(n) =O(n). But this requires picking v to be the median, which is our ultimate goal! Instead, we follow a much simpler alternative: we pick v randomly from S.

How to choose v? (cont d) Worst-case scenario would force our selection algorithm to perform Best-case scenario: O(n). n +(n 1) + (n 2) + + n 2 = (n2 ) Where, in this spectrum from O(n) to (n 2 ), does the average running time lie? Fortunately, it lies very close to the best-case time.

The e ciency analysis v is good if it lies within the 25th to 75th percentile of the array that it is chosen from. Arandomlychosenvhasa50% chance of being good, Lemma On average a fair coin needs to be tossed two times before a heads is seen. Proof. E := expected number of tosses before head is seen. We need at least one toss, and it s heads, we re done. If it s tail (with probability 1/2), we need to repeat. Hence whose solution is E =2 E =1+ 1 2 E,

The e ciency analysis (cont d) Let T (n) betheexpected running time on an array of size n, weget T (n) apple T (3n/4) + O(n) =O(n).

Matrix multiplication

The product of two n n matrices X and Y is a third n n matrix Z = XY, with (i, j)th entry nx Z ij = X ik Y kj k=1 That is, Z ij is the dot product of the ith row of X with the jth column of Y. In general, XY is not the same as YX ; matrix multiplication is not commutative. The preceding formula implies an O(n 3 ) algorithm for matrix multiplication.

Volker Strassen (1936 ) In 1969, the German mathematician Volker Strassen announced a surprising O(n 2.81 ) algorithm.

Divide and conquer apple apple A B E X = C D, Y = G F H Then apple A XY = C B D apple E F apple AE + BG G H = CE + DG AF + BH CF + DH To compute the size-n product XY,recursively compute eight size-n/2 products AE, BG, AF, BH, CE, DG, CF, DH and then do some O(n 2 )-time addition. The recurrence is with solution O(n 3 ). T (n) =8T (n/2) + O(n 2 )

Strassen s trick where apple P5 + P 4 P 2 + P 6 P 1 + P 2 XY = P 3 + P 4 P 1 + P 5 = P 3 P 7 P 1 = A(F H) P 5 = (A + D)(E + H) P 2 = (A + B)H P 6 = (B D)(G + H) P 3 = (C + D)E P 7 = (A C)(E + F ) P 4 = D(G E) The recurrence is with solution O(n log 2 7 ) O(n 2.81 ). T (n) =7T (n/2) + O(n 2 )

Faster Polynomial Multiplication

Let A and B be polynomials of degree at most n: A(x) = B(x) = Wish to calculate n a j x j for some a 0,..., a n R, j=0 n b j x j for some b 0,..., b n R. j=0 Necessarily, deg(c) = 2n and so C(x) = for some c 0,..., c 2n R. C(x) = A(x)B(x) 2n j=0 c j x j

Problem: Given coefficients of A and B, compute coeffs of C.

Problem: Given coefficients of A and B, compute coeffs of C. Easy to see that c j = j a i b j i (0 j 2n) i=0 This suggests the naive algorithm : for (int j {0,..., 2n}) do c j = 0 for (int i {0,..., j}) do c j += a i b j 1 Use the real-number model of computation: arithmetic operation on real numbers has unit cost. Cost of naive algorithm?

Problem: Given coefficients of A and B, compute coeffs of C. Easy to see that c j = j a i b j i (0 j 2n) i=0 This suggests the naive algorithm : for (int j {0,..., 2n}) do c j = 0 for (int i {0,..., j}) do c j += a i b j 1 Use the real-number model of computation: arithmetic operation on real numbers has unit cost. Cost of naive algorithm? Θ(n 2 ) Can we do better?

Problem: Given coefficients of A and B, compute coeffs of C. Easy to see that c j = j a i b j i (0 j 2n) i=0 This suggests the naive algorithm : for (int j {0,..., 2n}) do c j = 0 for (int i {0,..., j}) do c j += a i b j 1 Use the real-number model of computation: arithmetic operation on real numbers has unit cost. Cost of naive algorithm? Θ(n 2 ) Can we do better? Yes!

Without loss of generality, assume n a power of two. Write A(x) = A 1 (x)x n/2 + A 2 (x) B(x) = B 1 (x)x n/2 + B 2 (x) where deg A 1 = deg A 2 = deg B 1 = deg B 2 = n/2. Then C(x) = A 1 (x)b 1 (x)x n + ( A 2 (x)b 1 (x)+a 1 (x)b 2 (x) ) x n/2 +A 2 (x)b 2 (x) Use Gauss trick: P 1 (x) = A 1 (x)b 1 (x) P 2 (x) = A 2 (x)b 2 (x) P 3 (x) = ( A 1 (x) + A 2 (x) )( B 1 (x) + B 2 (x) ) S(x) = P 3 (x) P 1 (x) P 2 (x) C(x) = P 1 (x)x n + S(x)x n/2 + P 2 (x)

We then find T (n) = 3T (n/2) + Θ(n) By the Master Theorem, we have T (n) = Θ(n log 2 3 ) =. Θ(n 1.59 ) Can we do better?

We then find T (n) = 3T (n/2) + Θ(n) By the Master Theorem, we have T (n) = Θ(n log 2 3 ) =. Θ(n 1.59 ) Can we do better? Yes!

If we follow the approach that the Toom-Cook algorithms uses in multiplying n-bit numbers, we can show ε > 0, we can multiply two polnomials of degree n using Θ(n 1+ε ) arithmetic operations.

If we follow the approach that the Toom-Cook algorithms uses in multiplying n-bit numbers, we can show ε > 0, we can multiply two polnomials of degree n using Θ(n 1+ε ) arithmetic operations. However, the Θ-constnat blows up as ε 0. Can we do better?

If we follow the approach that the Toom-Cook algorithms uses in multiplying n-bit numbers, we can show ε > 0, we can multiply two polnomials of degree n using Θ(n 1+ε ) arithmetic operations. However, the Θ-constnat blows up as ε 0. Can we do better? Yes! (use the Fast Fourier transform)

The fast Fourier transform

Polynomial multiplication If A(x) =a 0 + a 1x + + a d x d and B(x) =b 0 + b 1x + + b d x d, their product has coe cients C(x) =A(x) B(x) =c 0 + c 1x + + c 2d x 2d c k = a 0b k + a 1b k 1 + + a k b 0 = where for i > d, takea i and b i to be zero. kx a i b k Computing c k from this formula takes O(k) steps, and finding all 2d +1 coe cients would therefore seem to require (d 2 ) time. i=0 i

An alternative representation of polynomials Fact Adegree-dpolynomialisuniquelycharacterizedbyitsvaluesatanyd+1 distinct points. E.g., any two points determine a line. We can specify a degree-d polynomial A(x) =a 0 + a 1x + + a d x d by either one of the following: 1. Its coe cients a 0, a 1,...,a d.(coe cient representation) 2. The values A(x 0), A(x 1),...,A(x d ). (value representation)

An alternative representation of polynomials (cont d) coe cient representation evaluation! value representation interpolation - The product C(x) hasdegree2d, itiscompletelydeterminedbyitsvalue at any 2d +1points. - The value of C(z) atanygivenpointz is just A(z) timesb(z). Thus polynomial multiplication takes linear time in the value representation.

The algorithm Input: Coe cients of two polynomials, A(x) andb(x), of degree d Output: Their product C = A B Selection Pick some points x 0, x 1,...,x n 1, wheren 2d +1 Evaluation Compute A(x 0), A(x 1),...,A(x n 1) andb(x 0), B(x 1),...,B(x n 1) Multiplication Compute C(x k )=A(x k )B(x k ) for all k =0,...,n 1 Interpolation Recover C(x) =c 0 + c 1x + + c 2d x 2d

The selection step and the n multiplications are just linear time: In a typical setting for polynomial multiplication, the coe cients of the polynomials are real numbers and, moreover, are small enough that basic arithmetic operations (adding and multiplying) take unit time. Evaluating a polynomial of degree d apple n at a single point takes O(n), and so the baseline for n points is (n 2 ). The fast Fourier transform (FFT) does it in just O(n log n) time, for a particularly clever choice of x 0,...,x n 1 in which the computations required by the individual points overlap with one another and can be shared.

Evaluation by divide-and-conquer We pick the n points: ±x 0, ±x 1,...,±x n/2 1, then the computations required for each A(x i )anda( x i )overlapalot, because the even powers of x i coincide with those of x i. To investigate this, we need to split A(x) into its odd and even powers, for instance 3+4x +6x 2 +2x 3 + x 4 +10x 5 = (3 + 6x 2 + x 4 ) + x(4 + 2x 2 +10x 4 ). More generally A(x) =A e(x 2 ) + xa o(x 2 ), where A e( ), with the even-numbered coe cients, and A o( ), with the odd-numbered coe cients, are polynomials of degree apple n/2 1 (assume for convenience that n is even).

Evaluation by divide-and-conquer (cont d) Given paired points ±x i, the calculations needed for A(x i )canberecycled toward computing A( x i ): A(x i )=A e(x 2 i )+x i A o(x 2 i ) A( x i )=A e(x 2 i ) x i A o(x 2 i ). In other words, evaluating A(x) atn paired points ±x 0,...,±x n/2 1 reduces to evaluating A e(x) anda o(x) (whicheachhavehalfthedegreeofa(x)) at just n/2 points,x0 2,...,x 2 n/2 1. If we could recurse, wewouldgetadivide-and-conquerprocedurewithrunning time T (n) =2T (n/2) + O(n) =O(n log n).

How to choose n points? We have a problem: To recurse at the next level, we need the n/2 evaluation points x0 2, x1 2,...,x 2 n/2 1 to be themselves plus-minus pairs. How can a square be negative? We use complex numbers. At the very bottom of the recursion, we have a single point. This point might as well be 1, in which case the level above it must consist of its square roots, ± p 1=±1. The next level up then has ± p +1 = ±1 aswellasthecomplexnumbers ± p 1=±i, wherei is the imaginary unit. By continuing in this manner, we eventually reach the initial set of n points: the complex nth roots of unity, thatis,then complex solutions to the equation z n =1.

The complex roots of unity The nth roots of unity: the complex numbers where If n is even, then: 1,!,! 2,...,! n 1,! = e 2 i/n. 1. The nth roots are plus-minus paired,! n/2+j =! j. 2. Squaring them produces the (n/2)nd roots of unity. Therefore, if we start with these numbers for some n that is a power of 2, then at successive levels of recursion we will have the (n/2 k )th roots of unity, for k =0, 1, 2, 3,... All these sets of numbers are plus-minus paired, and so our divide-and-conquer, works perfectly. The resulting algorithm is the fast Fourier transform.

The fast Fourier transform (polynomial formulation) FFT(A,!) Input: coe cient representation of a polynomial A(x) of degree apple n 1, where n is a power of 2;!, annth root of unity Output: value representation A(! 0 ),...,A(! n 1 ) 1. if! =1then return A(1) 2. express A(x) in the form A e(x 2 )+xa o(x 2 ) 3. call FFT(A e,! 2 )toevaluatea e at even powers of! 4. call FFT(A o,! 2 )toevaluatea o at even powers of! 5. for j =0to n 1 do 6. compute A(! j )=A e(! 2j )+! j A o(! 2j ) 7. return A(! 0 ),...,A(! n 1 )

Interpolation We designed the FFT, a way to move from coe cients to values in time just O(n log n), when the points {x i } are complex nth roots of unity That is 1,!,! 2,...,! n 1. hvaluei = FFT hcoe cientsi,!. We will see that the interpolation can be computed by hcoe cientsi = 1 FFT hvaluesi,!. n

A matrix reformulation Let s explicitly set down the relationship between our two representations for a polynomial A(x) ofdegreeapple n 1. 2 6 4 A(x 0) A(x 1). A(x n 1) 3 2 1 x 0 x0 2... x n 1 32 0 7 5 = 1 x 1 x1 2... x n 1 1 6 76 4. 54 1 x n 1 xn 2 1... x n 1 n 1 a 0 a 1. a n 1 Let M be the matrix in the middle, which is a Vandermonde matrix: If x 0,...,x n 1 are distinct numbers, then M is invertible. Hence Evaluation is multiplication by M, while interpolation is multiplication by M 1. 3 7 5

Interpolation resolved In linear algebra terms, the FFT multiplies an arbitrary n-dimensional vector which we have been calling the coe cient representation bythen n matrix 2 3 1 1 1... 1 1!! 2...! n 1. M n(!) = 1! j! 2j (n 1)j....! 6 4 7. 5 1! n 1! 2(n 1) (n 1)(n 1)...! Its (j, k)th entry (starting row- and column-count at zero) is! jk. We will show: Inversion formula M n(!) 1 = 1 n Mn! 1.

Interpolation resolved (cont d) Take! to be e 2 i/n and think of the columns of M as vectors in C n. Recall that the angle between two vectors u =(u 0,...,u n 1) andv =(v 0,...,v n 1) in C n is just a scaling factor times their inner product u v = u 0v 0 + u 1v 1 + + u n 1v n 1, where z denotes the complex conjugate of z. Recall the complex conjugate of a complex number z = re i is z = re i. The complex conjugate of a vector (or matrix) is obtained by taking the complex conjugates of all its entries. The above quantity is maximized when the vectors lie in the same direction and is zero when the vectors are orthogonal to each other.

Interpolation resolved (cont d) Lemma The columns of matrix M are orthogonal to each other. Proof. Take the inner product of any columns j and k of matrix M, 1+! j k +! 2(j k) + +! (n 1)(j k). This is a geometric series with first term 1, last term! (n 1)(j k),andratio! j k. Therefore, if j 6= k, thenitevaluatesto If j = k, thenitevaluateston n(j k) 1! =0 1! (j k) Corollary MM = ni, i.e., M n(!) 1 = 1 n Mn(! 1 ).

The fast Fourier transform (general formulation) FFT(a,!) Input: An array a =(a 0, a 1,...,a n 1) for n apowerof2 Aprimitiventh root of unity,! Output: M n(!) 1. if! =1then return a 2. (s 0, s 1,...,s n/2 1 )=FFT((a 0, a 2,...,a n 2),! 2 ) 3. (s0, 0 s1,...,s 0 n 0 2) =FFT((a 1, a 3,...,a n 1),! 2 ) 4. for j =0to n/2 1 do 5. r j = s j +! j sj 0 6. r j+n/2 = s j! j sj 0 7. return (r 0, r 1,...,r n 1)