CSE 548: (Design and) Analysis of Algorithms

Size: px

Start display at page:

Download "CSE 548: (Design and) Analysis of Algorithms"

Shanna Grant
6 years ago
Views:

1 Warmup Sorting Selection Closest pair Multiplication FFT 1 / 81 CSE 548: (Design and) Analysis of Algorithms Divide-and-conquer Algorithms R Sekar

2 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation 2 / 81 Divide-and-Conquer: A versatile strategy Steps Break a problem into subproblems that are smaller instances of the same problem Recursively solve these subproblems Combine these answers to obtain the solution to the problem Benefits Conceptual simplification Speed up: rapidly (exponentially) reduce problem space exploit commonalities in subproblem solutions Parallelism: Divide-and-conquer algorithms are amenable to parallelization Locality: Their depth-first nature increases locality, extremely important for today s processors

3 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation Topics 1 Warmup Overview Search H-Tree Exponentiation 2 Sorting Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix sort 3 Selection Select k-th min Priority Queues 4 Closest pair 5 Multiplication Matrix Multiplication Integer multiplication 6 FFT Fourier Transform DFT FFT Algorithm Fast multiplication 3 / 81

4 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation Binary Search Problem: Find a key k in an ordered collection Examples: Sorted array A[n]: Compare k with A[n/2], then recursively search in A[0 (n/2 1)] (if k < A[n/2]) or A[n/2 n] (otherwise) Binary search tree T: Compare k with root(t), based on the result, recursively search left or right subtree of root B-Tree: Hybrid of the above two Root stores an array M of m keys, and has m + 1 children Use binary search on M to identify which child can contain k, recursively search that subtree 4 / 81

5 5 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation H-tree: Planar embedding of full binary tree Key properties fractal geometry divide-and-conquer structure n nodes in O(n) area all root-to-leaf paths equal in length Applications compact embedding of binary trees in VLSI hardware clock distribution

vertline(l + l+r 4, b + b+t 4, t b+t 4 ) vertline(r l+r 4, b + b+t 4, t b+t 4 ) if n 4 return MkHtree(l, b+t

6 6 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation Divide-and-conquer Construction of H-tree MkHtree(l, b, r, t, n) horizline( b+t 2, l + l+r 4, r l+r 4 ) vertline(l + l+r 4, b + b+t 4, t b+t 4 ) vertline(r l+r 4, b + b+t 4, t b+t 4 ) if n 4 return MkHtree(l, b+t 2, l+r 2, t, n 4 ) MkHtree( l+r 2, b+t 2, r, t, n 4 ) MkHtree(l, b, l+r 2, b+t MkHtree( l+r 2, n) 2 4 b+t, b, r,, n) 2 4

7 7 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation Divide-and-conquer Construction of H-tree Questions How compact is the embedding Ratio of minimum distance between nodes and the average area per node What is the root-to-leaf path length? Can we do better? Finally, how can we show that the algorithm is correct? MkHtree(l, b, r, t, n) horizline( b+t 2, l + l+r 4, r l+r 4 ) vertline(l + l+r 4, b + b+t 4, t b+t 4 ) vertline(r l+r 4, b + b+t 4, t b+t 4 ) if n 4 return MkHtree(l, b+t 2, l+r 2, t, n 4 ) MkHtree( l+r 2, b+t 2, r, t, n 4 ) MkHtree(l, b, l+r 2, b+t MkHtree( l+r 2, n) 2 4 b+t, b, r,, n) 2 4

8 Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation Exponentiation How many multiplications are required to compute x n? Can we use a divide-and-conquer approach to make it faster? ExpBySquaring(n, x) if n > 1 y = ExpBySquaring( n/2, x 2 ) if odd(n) y = x y return y else return x 8 / 81

9 function merge(x[1 k], y[1 l]) if k = 0: return y[1 l] 9 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound nning time of just O(log n) Merge Sort 3 Mergesort e problem of sorting a list of numbers lends itself immediately to a divide-and-conq ategy: split the list into two halves, recursively sort each half, and then merge the ted sublists function mergesort(a[1 n]) Input: An array of numbers a[1 n] Output: A sorted version of this array if n > 1: return merge(mergesort(a[1 n/2 ]), mergesort(a[ n/2 + 1 n])) else: return a e correctness of this algorithm is self-evident, as long as a correct merge subroutin cified If we are given two sorted arrays x[1 k] and y[1 l], how do we efficiently m m into a single sorted array z[1 k + l]? Well, the very first element of z is either x[1 ], whichever is smaller The rest of z[ ] can then be constructed recursively

10 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound e correctness of this algorithm is self-evident, as cified MergeIfSort we are (Continued) given two sorted arrays x[1 k] m into a single sorted array z[1 k + l]? Well, th ], whichever is smaller The rest of z[ ] can then be function merge(x[1 k], y[1 l]) if k = 0: return y[1 l] if l = 0: return x[1 k] if x[1] y[1]: return x[1] merge(x[2 k], y[1 l]) else: return y[1] merge(x[1 k], y[2 l]) re denotes concatenation This merge procedu 10 / 81

11 11 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Merge Sort Illustration Input:

12 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Merge sort time complexity mergesort(a) makes two recursive invocations of itself, each with an array half the size of A merge(a, B) takes time that is linear in A + B Thus, the runtime is given by the recurrence ( n ) T(n) = 2T + n 2 In divide-and-conquer algorithms, we often encounter recurrences of the form ( n T(n) = at + O(n b) d ) Can we solve them once for all? 12 / 81

13 13 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Master Theorem If T(n) = at ( n b) + O(n d ) for constants a > 0, b > 1, and d 0, then O(n d ), if d > log b a T(n) = O(n d log n) if d = log b a O(n log b a ) if d < log b a

14 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Proof of Master Theorem Figure 23 Each problem of size n is divided into a subproblems of size n/b Size n Branching factor a Size n/b Size n/b 2 Depth log b n Size 1 Width a log b n = n log b a Can be proved by induction, or by summing up the series where subproblems, each of size n/b k (Figure 23) The total work done at this level each term represents the work done at one level of this tree k n d d a k 14 / 81

15 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound 15 / 81 What if Master Theorem can t be appplied? Guess and check (prove by induction) expand recursion for a few steps to make a guess in principle, can be applied to any recurrence Akra-Bazzi method (not covered in class) recurrences can be much more complex than that of Master theorem

16 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound More on time complexity: Fibonacci Numbers function fib1(int n) if n = 0 return 0; if n = 1 return 1; return fib1(n 1) + fib1(n 2) Is this algorithm correct? Yes: follows the defintion of Fibonacci What is its runtime? T(n) = T(n 1) + T(n 2) + 3, with T(k) 2 for k < 2 Solution is an exponential function Prove this by induction! Can we do better? 16 / 81

17 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Structure of calls to fib1 Figure 01 The proliferation of recursive calls in fib1 F n F n 1 F n 2 F n 2 F n 3 F n 3 F n 4 F n 4 F n 4 F n 5 F n 3 F n 4 F n 5 F n 5 F n 6 Complete binary tree of depth n, contains 2 n calls to fib1 As with fib1, the correctness of this algorithm is self-evident because it directly uses the definition But there of Fare n How only longn does distinct it take? Fibonacci The inner loop numbers consists of being a single computed! computer step and is executed n 1 times Therefore the number of computer steps used by fib2 is linear in n From exponential we are down to polynomial, a huge breakthrough in running time It is now Each Fibonacci number computed an exponential number of times! perfectly reasonable to compute F 200 or even F 200,000 1 As we will see repeatedly throughout this book, the right algorithm makes all the differ- 17 / 81

18 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound 18 / 81 Improved Algorithm for Fibonacci Linear-time algorithm! function fib2(n) int f [max(2, n + 1)]; f [0] = 0; f [1] = 1; for (i = 2; i n; i++) f [i] = f [i 1] + f [i 2]; return f [n] But wait! We are operating on very large numbers n th Fibonacci number requires approx 0694n bits Prove this by induction! Operation on k-bit numbers require k operations ie, Computing F n requires 0694n log n operations

19 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Quicksort qs(a, l, h) /*sorts A[l h]*/ if l >= h return; (h 1, l 2 ) = partition(a, l, h); qs(a, l, h 1 ); qs(a, l 2, h) partition(a, l, h) k = selectpivot(a, l, h); p = A[k]; swap(a, h, k); i = l 1; j = h; while true do do i++ while A[i] < p; do j while A[j] > p; if i j break; swap(a, i, j); swap(a, i, h) return (j, i + 1) 19 / 81

20 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound 20 / 81 Analysis of Runtime of qs General case: Given by the recurrence T(n) = n + T(n 1 ) + T(n 2 ) where n 1 and n 2 are the sizes of the two sub-arrays after partition Best case: n 1 = n 2 = n/2 By master theorem, T(n) = O(n log n) Worst case: n 1 = 1, n 2 = n 1 By master theorem, T(n) = O(n 2 ) A fixed choice of pivot index, say, h, leads to worst-case behavior in common cases, eg, input is sorted Lucky/unlucky split: Alternate between best- and worst-case splits T(n) = n + T(1)+ T(n-1) + n (worst case split) = n + 1+ (n-1) + 2T((n-1)/2) = 2n + 2T((n 1)/2) which has an O(n log n) solution Three-fourths split: T(n) = n + T(025n) + T(075n) n + 2T(075n) = O(n log n)

21 21 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Average case analysis of qs Define input distribution: All permutations equally likely Simplifying assumption: all elements are distinct (Nonessential assumption) Set up the recurrence: When all permutations are qually likely, the selected pivot has an equal chance of ending up at the i th position in the sorted order, for all 1 i n Thus, we have the following recurrence for the average case: T(n) = n + 1 n 1 (T(i) + T(n i)) n i=1 Solve recurrence: Cannot apply the master theorem, but since it seems that we get an O(n log n) bound even in seemingly bad cases, we can try to establish a cn log n bound via induction

22 22 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Establishing average case of qs Establish base case (Trivial) Induction step involves summation of the form n 1 i=1 i log i Attempt 1: bound log i above by log n (Induction fails) Attempt 2: split the sum into two parts: x=1 n/2 i log i + i=1 n 1 i=n/2+1 i log i and apply the approx to each half (Succeeds with c 4) Attempt 3: replace the summation with the upper bound n ( x log x = x2 log x 1 ) n 2 2 (Succeeds with the constraint c 2) x=1

23 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound 23 / 81 Randomized Quicksort Picks a pivot at random What is its complexity? For randomized algorithms, we talk about expected complexity, which is an average over all possible values of the random variable If pivot index is picked uniformly at random over the interval [l, h], then: every array element is equally likely to be selected as the pivot every partition is equally likely thus, expected complexity of randomized quicksort is given by the same recurrence as the average case of qs

24 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound An n log n lower bound for sorting Lower bounds for comparison-based sorting Sorting algorithms can be depicted as trees The one in the following figure sorts an array of three elements, a 1, a 2, a 3 It starts by comparing a 1 to a 2 and, if the first is larger, compares it with a Sorting 3 ; otherwise it compares a algorithms can be depicted 2 and a 3 And so on Eventually we end up at a leaf, and this leaf is labeled with the true order of theas three trees: elements each as leaf a permutation identifies the of 1, input 2, 3 For example, if a 2 < a 1 < a 3, we get the leaf labeled permutation that yields a sorted order No a 1 < a 2? Yes a 1 < a 3? a 2 < a 3? a 2 < a 3? a 1 < a 3? The depth of the tree the number of comparisons on the longest path from root to leaf, in this Thecase tree3 is hasexactly n! leaves, the worst-case and hence time a height complexity of log of the n! algorithm By Stirling s This way of looking at sorting algorithms is useful because it allows one to argue that mergesort approximation, is optimal, n! in the sense 2πn ( ) n n, that e Ω(n so, log log n) n! comparisons = O(n log are n) necessary for sorting n elements No Here comparison-based the argument: Consider sorting anyalgorithm such tree that can sorts do better! an array of n elements Each of its leaves is labeled by a permutation of {1, 2,, n} In fact, every permutation must appear 24 / 81

(buckets), based on key values Linear scan of input, drop into appropriate bucket Example

25 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Bucket sort Overview Divide: Partition input into intervals (buckets), based on key values Linear scan of input, drop into appropriate bucket Example Recurse: Sort each bucket Combine: Concatenate bin contents Images from Wikipedia commons 25 / 81

26 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound 26 / 81 Bucket sort (Continued) Bucket sort generalizes quicksort to multiple partitions Combination = concatenation Worst case quadratic bound applies But performance can be much better if input distribution is uniform Exercise: What is the runtime in this case? Used by letter sorting machines in post offices

27 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Counting Sort Special case of bucket sort where each bin corresponds to an interval of size 1 No need to recurse Divide = conquered! Makes sense only if range of key values is small (usually constant) Thus, counting sort can be done in O(n) time! Hmm How did we beat the O(n log n) lower bound? 27 / 81

28 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Radix Sorting Treat an integer as a sequence of digits Sort digits using counting sort LSD sorting: Sort first on least significant digit, and most significant digit last After each round of counting sort, results can be simply concatenated, and given as input to the next stage MSD sorting: Sort first on most significant digit, and least significant digit last Unlike LSD sorting, we cannot concatenate after each stage Note: Radix sort does not divide inputs into smaller subsets If you think of input as multi-dimensional data, then we break down the problem to each dimension 28 / 81

29 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Stable sorting algorithms Stable sorting algorithms: don t change order

29 29 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Stable sorting algorithms Stable sorting algorithms: don t change order of equal elements Merge sort and LSD sort are stable Quicksort is not stable Why is stability important? Effect of sorting on attribute A and then B is the same as sorting on B, A LSD sort won t work without this property! Other examples: sorting spread sheets or tables on web pages Images from Wikipedia Commons

30 30 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Mergesort Recurrences Fibonacci Numbers Quicksort Lower Bound Sorting strings Can use LSD or MSD sorting Easy if all strings are of same length Requires a bit more care with variable-length strings Starting point: use a special terminator character t < a for all valid characters a Easy to devise an O(nl) algorithm, where n is the number of strings and l is the maximum size of any string But such an algorithm is not linear in input size Exercise: Devise a linear-time string algorithm Given a set S of strings, your algorithm should sort in O( S ) time, where S = s s S

31 Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues 31 / 81 Select k th largest element Obvious approach: Sort, pick k th element wasteful, O(n log n) Better approach: Recursive partitioning, search only on one side qsel(a, l, h, k) if l = h return A[l]; (h 1, l 2 ) = partition(a, l, h); if k h 1 return qsel(a, l, h 1, k) else return qsel(a, l 2, h, k) Complexity Best case: Splits are even: T(n) = n + T(n/2), which has an O(n) solution Skewed 10%/90% T(n) n + T(09n) still linear Worst case: T(n) = n + T(n 1) quadratic!

32 Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues Worst-case O(n) Selection Intuition: Spend a bit more time to select a pivot that ensures reasonably balanced partitions MoM Algorithm [Blum, Floyd, Pratt, Rivest and Tarjan 1973] 32 / 81

33 Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues O(n) Selection: MoM Algorithm Quick select (qsel) takes no time to pick a pivot, but then spends O(n) to partition Can we spend more time upfront to make a better selection of the pivot, so that we can avoid highly skewed splits? Key Idea Use the selection algorithm itself to choose the pivot Divide into sets of 5 elements Compute median of each set (O(5), ie, constant time) Use selection recursively on these n/5 elements to pick their median ie, choose the median of medians (MoM) as the pivot Partition using MoM, and recurse to find kth largest element 33 / 81

34 Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues O(n) Selection: MoM Algorithm Theorem: MoM-based split won t be worse than 30%/70% Result: Guaranteed linear-time algorithm! Caveat: The constant factor is non-negligible; use as fall-back if random selection repeatedly yields unbalanced splits 34 / 81

Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues Selecting maximum element: Priority Queues Heap A tree-based data structure for priority queues Heap property:

35 Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues Selecting maximum element: Priority Queues Heap A tree-based data structure for priority queues Heap property: H is a heap if for every subtree h of H k keys(h) root(h) k where keys(h) includes all keys appearing within h Note: No ordering of siblings or cousins Supports insert, deletemax and max Typically implemented using arrays, ie, without an explicit tree data structure Task of maintaining max is distributed to subsets of the entire set; alternatively, it can be thought of as maintaining several parallel queues with a single head Images from Wikimedia Commons 35 / 81

36 Warmup Sorting Selection Closest pair Multiplication FFT Select k-th min Priority Queues Binary heap Array representation: Store heap elements in breadth-first order in the array Node i s children are at indices 2 i and 2 i + 1 Conceptually, we are dealing with a balanced binary tree Max: Element at the root of the array, extracted in O(1) time DeleteMax: Overwrite root with last element of heap Fix heap takes O(log n) time, since only the ancestors of the last node need to be fixed up Insert: Append element to the end of array, fix up heap MkHeap: Fix up the entire heap Takes O(n) time Heapsort: O(n log n) algorithm, MkHeap followed by n calls to DeleteMax 36 / 81

37 Warmup Sorting Selection Closest pair Multiplication FFT Finding closest pair of points Problem: Given a set of n points in a d-dimensional space, identify the two that have the smallest Euclidean distance between them Applications: A central problem in graphics, vision, air-traffic control, navigation, molecular modeling, and so on Images from Wikipedia Commons 37 / 81

38 Warmup Sorting Selection Closest pair Multiplication FFT Divide-and-conquer closest pair (2D) Divide: Identify k such that the line x = k divides the points evenly (Median computation, takes O(n) time) Recursive case: Find closest pair in each half Combine: Can t just take the min of the closest pairs from two halves Need to consider pairs across the divide line seems that this will take O(n 2 ) time! 38 / 81

39 Warmup Sorting Selection Closest pair Multiplication FFT Speeding up search for cross-region pairs Observation (Key Observation 1) Let δ 1 and δ 2 be the minimum distances in each half Need only consider points within δ = min(δ 1, δ 2 ) from the dividing line We expect that only a small number of points will be within such a narrow strip But in the worst case, every point could be within the strip! 39 / 81

40 Warmup Sorting Selection Closest pair Multiplication FFT Sparsity condition Consider a point p on the left δ-strip How many points q 1,, q r on the right δ-strip could be within δ from p? Observation (Key Observation 2) q 1,, q r should all be within a rectangular 2δ δ as shown r can t be too large: q 1,, q r will crowd together, closer than δ Theorem: r 6 We need to consider at most 6n cross-region pairs! Remains O(n) in higher dimensions as well 40 / 81

41 Warmup Sorting Selection Closest pair Multiplication FFT Closest pair: Summary Recurrence: T(n) = 2T(n/2) + Ω(n), since median computation is already linear-time Thus, T(n) = Ω(n log n) To get to O(n log n), need to 1 compute the δ-strip in O(n) time Keep the points in each region sorted in x-dimension Takes an additional O(n log n) time, no change to overall complexity 2 compute q 1,, q 6 in O(1) time keep points in each region sorted also in y-dimension maintain this order while deleting points outside δ strip in this list, for each p, consider only 12 neighbors 6 on each side of divide 41 / 81

42 Y is not the same as Y X; matrix multiplication is not commutative eding formula implies an O(n 3 ) algorithm for matrix multiplication: t 42 / 81 rix multiplication Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Matrix Multiplication of two n n matrices X and Y is a third n n matrix Z = XY, with (i n The product Z of two n Zn ij matrices = X ik XYand kj Y is given by n k=1 ij = X ik Y kj leads to an O(n 3 ) algorithm ore visual, Z ij is the dot product of the ith row of X with the jth colu k=1 j i = (i, j) X Y Z

43 43 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Divide-and-conquer Matrix Multiplication Divide X and Y into four n/2 n/2 submatrices [ ] [ ] A B E F X = and Y = C D G H Recursively invoke matrix multiplication on these submatrices: [ ] [ ] [ ] A B E F AE + BG AF + BH XY = = C D G H CE + DG CF + DH Divided, but did not conquer! T(n) = 8T(n/2) + O(n 2 ), which is still O(n 3 )

44 44 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Strassen s Matrix Multiplication Strassen showed that 7 multiplications are enough: [ ] P 6 + P 5 + P 4 P 2 P 1 + P 2 XY = P 3 + P 4 P 1 P 3 + P 5 P 7 where P 1 = A(F H) P 5 = (A + D)(E + H) P 2 = (A + B)H P 6 = (B D)(G + H) P 3 = (C + D)E P 7 = (A C)(E + F) P 4 = D(G E) Now, the recurrence T(n) = 7T(n/2) + O(n 2 ) has O(n log 2 7 = n 281 ) solution! Best-to-date complexity is about O(n 24 ), but this algorithm is not very practical

45 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Karatsuba s Algorithm Same high-level strategy as Strassen but predates Strassen Divide: n-digit numbers into halves, each with n/2-digits: a = a 1 a 0 = 2 n/2 a 1 + a 0 b = b 1 b 0 = 2 n/2 b 1 + b 0 ab = 2 n a 1 b n/2 (a 1 b 0 + b 1 a 0 ) + a 0 b 0 Key point Instead of 4 multiplications, we can get by with 3 since: a 1 b 0 + b 1 a 0 = (a 1 + a 0 )(b 1 + b 0 ) a 1 b 1 a 0 b 0 Recursively compute a 1 b 1, a 0 b 0 and (a 1 + a 0 )(b 1 + b 0 ) Recurrence T(n) = 3T(n/2) + O(n) has an O(n log 2 3 = n 159 ) solution! Note: This trick for using 3 (not 4) multiplications noted by Gauss ( ) in the context of complex numbers 45 / 81

46 46 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Toom-Cook Multiplication Generalize Karatsuba Divide into n > 2 parts Can be more easily understood when integer multiplication is viewed as a polynomial multiplication

47 47 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Integer Multiplication Revisited An integer represented using digits a n 1 a 0 over a base d (ie, 0 a i < d) is very similar to the polynomial n 1 A(x) = a i x i i=0 Specifically, the value of the integer is A(d) Integer multiplication follows the same steps as polynomial multiplication: a n 1 a 0 b n 1 b 0 = (A(x) B(x))(d)

48 48 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Polynomials: Basic Properties Horner s rule An n th degree polynomial n i=0 a ix i can be evaluated in O(n) time: (( ((a n x + a n 1 )x + a n 2 )x + + a 1 )x + a 0 ) Roots and Interpolation An n th degree polynomial A(x) has exactly n roots r 1,, r n In general, r i s are complex and need not be distinct It can be represented as a product of sums using these roots: n n A(x) = a i x i = (x i r i ) i=1 Alternatively, A(x) can be specified uniquely by specifying n + 1 points (x i, y i ) on it, ie, A(x i ) = y i i=0

49 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Operations on Polynomials Representation Add Mult Coefficients O(n) O(n 2 ) Roots? O(n) Points O(n) O(n) Note: Point representation is the best for computation! But usually, only the coefficients are given Solution: Convert to point form by evaluating A(x) at selected points But conversion defeats the purpose: requires O(n) evaluations, each taking O(n) time, thus we are back to O(n 2 ) total time Toom (and FFT) Idea: Choose evaluation points judiciously to speed up evaluation 49 / 81

50 50 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Matrix representation of Polynomial Evaluation Given a polynomial n 1 A(x) = a t x n t=0 choose m points x 0,, x m for its evaluation Evaluation can be expressed using matrix multiplication: p 0 1 x 0 x0 2 x n 1 0 p 1 1 x 1 x1 2 x n 1 1 p 2 = 1 x 2 x2 2 x n 1 2 p m 1 x m xm 2 xm n 1 a 0 a 1 a n 1

51 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Multiplication using Point Representation Let A(x) and B(x) be polynomials representing two numbers Evaluate both polynomials at chosen points x 0, x m P = XA Q = XB where P, X, A, Q and B denote matrices as in last page Compute point-wise product r 0 r m = p 0 q 0 p m q m Compute polynomial C corresponding to R R = XC C = X 1 R To avoid overflow, m should be degree(a) + degree(b) + 1 for R 51 / 81

52 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Improving complexity Key problem: Complexity of computing X and its inverse X 1 Toom strategy: Use low-degree polynomials eg, Toom-2 = Karatsuba uses degree 1 represents an n-bit number as a 2-digit number over a large base d = 2 n/2 Fix evaluation points for a given degree polynomial so that X and X 1 can be precomputed For Toom-2, x 0 = 0, x 1 = 1, x 2 = (Define A( ) = a n 1 ) Choose points so that expensive multiplications can be avoided while computing P = XA, Q = XB and C = X 1 R Toom-N on n-digit numbers needs 2N 1 multiplications on n/n digit numbers: T(n) = (2N 1)T(n/N) + O(n) which, by Master theorem, has a solution O(n log N (2N 1) ) solution 52 / 81

53 53 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Karatsuba revisited as Toom-2 Given evaluation points x 0 = 0, x 1 = 1, x 2 =, X = XA = Similarly XB = b 0 b 0 + b 1 a 0 a 1 0 = a 0 a 0 + a 1 a 1 Point-wise multiplication yields: R = b 1 a 0 b 0 (a 0 + a 1 )(b 0 + b 1 ) and so on a 1 b 1

54 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Limitations of Toom In principle, complexity can be reduced to n 1+ɛ for arbitrarily small positive ɛ by increasing N In reality, the algorithm itself depends on the choice of N Specifically, constant factors involved increase rapidly with N As a practical matter, N = 4 or 5 is where we stop Question: Can we go farther? 54 / 81

55 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication FFT and Schonhage-Strassen Key idea: evaluate polynomial on the complex plane Choose powers of Nth complex root of unity as the points for evaluation Enables sharing of operations in computing XA so that it can be done in O(N log N) time, rather than O(N 2 ) time needed for the naive matrix-multiplication based approach 55 / 81

56 56 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication FFT to the Rescue! Matrix form of DFT and interpretation as polynomial evaluation: s 0 s 1 s j s N ω ω 2 ω N 1 = 1 ω j ω 2j ω j(n 1) 1 ω N 1 ω 2(N 1) ω (N 1)(N 1) a 0 a 1 a j a N 1 Voila! FFT computes A(x) at N points (x i = ω i ) in O(N log N) time! O(N log N) integer multiplication Convert to point representation using FFT O(N log N) Multiply on point representation O(N) Convert back to coefficients using FFT 1 O(N log N)

57 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication FFT to the Rescue! s 0 s 1 s j s n 1 = ω ω 2 ω n 1 1 ω j ω 2j ω j(n 1) 1 ω n 1 ω 2(n 1) ω (n 1)(n 1) a 0 a 1 a j a n 1 FFT can be thought of as a clever way to choose points: Evaluations at many distinct points collapse together This is why we are left with 2T(n/2) work after division, instead of 4T(n/2) for a naive choice of points 57 / 81

58 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication 58 / 81 FFT-based multiplication: More careful analysis Computations on complex or real numbers can lose precision For integer operations, we should work in some other ring usually, we choose a ring based on modulo arithmetic Ex: in mod 33 arithmetic, 2 is the 10 th root of 1, ie, mod 33 More generally, 2 is the nth root of unity modulo (2 n/2 + 1) Point-wise additions and multiplications are not O(1) We are adding up to n numbers ( digits ) we need Ω(log n) bits So, total cost increases by at least log n, ie, O(n log 2 n) [Schonhage-Strassen 71] developed O(n log n log log n) algorithm: recursively apply their technique for inner operations

59 Warmup Sorting Selection Closest pair Multiplication FFT Matrix Multiplication Integer multiplication Integer Multiplication Summary Algorithms implemented in libraries for arbitrary precision arithmetic, with applications in public key cryptography, computer algebra systems, etc GNU MP is a popular library, uses various algorithms based on input size: naive, Karatsuba, Toom-3, Toom-4, or Schonhage-Strassen (at about 50K digits) Karatsuba is Toom-2 Toom-N is based on Evaluating a polynomial at 2N points, performing point-wise multiplication, and interpolating to get back the polynomial, while minimizing the operations needed for interpolation 59 / 81

60 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Fast Fourier Transformation One of the most widely used algorithms yet most people are unaware of its use! Solving differential equations: Applied to many computational problems in engineering, eg, heat transfer Audio: MP3, digital audio processors, music/speech synthesizers, speech recognition, Image and video: JPEG, MPEG, vision, Communication: modulation, filtering, radars, software-defined radios, H264, Medical diagnostics: MRI, PET, ultrasound, Quantum computing: See text Ch 10 Other: Optics, data compression, seismology, 60 / 81

Theorem) Any (sufficiently smooth) function with a period T can be expressed as a sum of

61 Images from Wikimedia Commons 61 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Fourier Series Theorem (Fourier Theorem) Any (sufficiently smooth) function with a period T can be expressed as a sum of series of sinusoids with periods T/n for integral n a(t) = (d n sin(2πnt/t) + e n cos(2πnt/t)) n=0

62 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Signal [touch tone button 1] Fourier Series Example: Touch tone button 1 Time domain Using the identity e ix = cos x + i sin x Fourier series becomes a(t) = c n e 2πint/T n= Frequency domain It can be shown T c n = a(t)e 2πint dt 0 For real a(t), c n = c n 05 π π Images from Cleve Moler, Numerical computing with MATLAB 62 / 81

63 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication 63 / 81 Fourier Transform What if a is not periodic? May be we can start with the Fourier series definition for c n c n = and let T? T 0 a(t)e 2πint dt Frequencies are not discrete any more, as the fundamental frequency f = 1/T 0 Instead of discrete coefficients c n, we will have a continuous function call it s(f ) s(f ) = a(t)e 2πift dt F(a) denotes a s Fourier transform F is almost self-inverting: F(F(a(t))) = a( t)

64 64 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication How do Fourier Series/Transform help? Differential equations: Turn non-integrable functions into a sum of easily integrable ones Some problems easier to solve in frequency domain: Filtering: filter out noise, tuning, Compression: eliminate high frequency components, Convolution: Convolution in time domain becomes (simpler) multiplication in frequency domain Definition (Convolution) (a b)(t) = a(t x)b(x)dx Theorem (Convolution) F(a b)(t) = F(a(t))F(b(t))

65 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Discrete Fourier Transform Real-world signals are typically sampled DFT is a formulation of FT applicable to such samples Nyquist rate: A signal with highest frequency n/2 can be losslessly reconstructed from n samples DFT of time domain samples a 0,, a n 1 yields frequency domain samples s 0,, s n 1 : n 1 s f = a t e 2πift/n cf s(f ) = t=0 a(t)e 2πift dt Note: DFT formulation can be derived from FT by treating the sampling process as a multiplication by a sequence of impulse functions separated by the sampling interval 65 / 81

66 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Background: Complex Plane, Polar Coordinates ure 26 The complex roots of unity are ideal for our divide-and-conquer scheme Imaginary b r θ a Real The complex plane z = a + bi is plotted at position (a, b) Polar coordinates: rewrite as z = r(cos θ + i sin θ) = re denoted (r, θ) length r = a 2 + b 2 angle θ [0, 2π): cos θ = a/r, sin θ = b/r θ can always be reduced modulo 2π Examples: Number 1 i 5 + 5i Polar coords (1, π) (1, π/2) (5 2, π/4) Multiplying is easy in polar coordinates (r2, θ2) (r1, θ1) Multiply the lengths and add the angles: (r 1, θ 1 ) (r 2, θ 2 ) = (r 1 r 2, θ 1 + θ 2 ) For any z = (r, θ), 66 / 81

67 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication r denoted (r, θ) length r = Polar Coordinates and Multiplication a 2 + b 2 θ a Real Polar coordinates: rewrite as z = r(cos θ + i sin θ) = re angle θ [0, 2π): cos θ = a/r, sin θ = b/r θ can always be reduced modulo 2π Examples: Number 1 i 5 + 5i Polar coords (1, π) (1, π/2) (5 2, π/4) Multiplying is easy in polar coordinates (r2, θ2) (r1r2, θ1 + θ2) 2π + π (r1, θ1) 4π n Angle 2π n Multiply the lengths and add the angles: (r 1, θ 1 ) (r 2, θ 2 ) = (r 1 r 2, θ 1 + θ 2 ) For any z = (r, θ), z = (r, θ + π) since 1 = (1, π) If z is on the unit circle (ie, r = 1), then z n = (1, nθ) The nth complex roots of unity Solutions to the equation z n = 1 By the multiplication rule: solutions are z = (1, θ), for θ multiple of 2π/n (shown here for n = 16) For even n: These numbers are plus-minus paired: (1, θ) = (1, θ+π 67 / 81

68 68 / 81 Warmup Sorting Selection (r1, θ1) Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication (r2, θ2) Roots of unity on Complex Plane (r1r2, θ1 + θ2) 2π n + π 4π n Angle 2π n Multiply the lengths and add the angles: (r 1, θ 1 ) (r 2, θ 2 ) = (r 1 r 2, θ 1 + θ 2 ) For any z = (r, θ), z = (r, θ + π) since 1 = (1, π) If z is on the unit circle (ie, r = 1), then z n = (1, nθ) The nth complex roots of unity Solutions to the equation z n = 1 By the multiplication rule: solutions are z = (1, θ), for θ multiple of 2π/n (shown here for n = 16) For even n: These numbers are plus-minus paired: (1, θ) = (1, θ+π Their squares are the (n/2)nd roots of unity, shown he with boxes around them Divide-and-conquer step Evaluate A(x) at nth roots of unity Evaluate A e(x), A o(x) at (n/2)nd roots

69 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Matrix representation of DFT Given time domain samples a t for t = 0, 1,, n 1, Compute frequency domain samples s f for f = 0, 1,, n 1 n 1 n 1 ( s f = a t e 2πift/n = a t e 2πi/n) ft n 1 = a t ω ft t=0 t=0 where ω = e 2πi/n is the nth complex root of unity s s 1 1 ω ω 2 ω n 1 s 2 1 ω 2 ω 4 ω 2(n 1) = s j 1 ω j ω 2j ω j(n 1) s n 1 1 ω n 1 ω 2(n 1) ω (n 1)(n 1) t=0 a 0 a 1 a 2 a j a n 1 69 / 81

70 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Matrix representation of DFT Note that e 2πi/n is represents the n th root of 1, denoted ω s 0 s 1 s 2 s j s n 1 = ω ω 2 ω n 1 1 ω 2 ω 4 ω 2(n 1) 1 ω j ω 2j ω j(n 1) 1 ω n 1 ω 2(n 1) ω (n 1)(n 1) a 0 a 1 a 2 a j a n 1 Possible interpretations of these matrix equations: Simultaneous equations that can be solved Change of basis (rotate coordinate system) Evaluation of polynomial n 1 k=0 a kx k at x = ω j, 0 j < n 70 / 81

71 71 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Speeding up FFT Computation Matrix multiplication formulation has an obvious divide-and-conquer implementation [ M 11 M n A n/2 Mn/2 12 0(n 1) = Mn/2 21 Mn/2 22 But this algorithm still takes O(n 2 ) time ] [ A ] 0 n/2 A n/2 (n 1) but wait! there are only O(n) distinct elements in the square matrix M n O(n) repetitions of each element in M n, so there is significant scope for sharing operations on submatrices!

72 72 / 81 n(ω), which has (j, k)th entry (starting row- and column-count at zero) ω The pote Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication or using divide-and-conquer in this matrix-vector multiplication becomes apparent when olumns are segregated into evens and odds: Observations about M(ω) k Column 2k 2k + 1 Column 2k 2k + 1 ω jk M n(ω) a 0 a 1 a 2 a 3 a 4 a n 1 a = j ω 2jk Even columns ω j ω 2jk Odd columns a 0 a 2 a n 2 a 1 a 3 a n 1 = Row j j + n/2 ω 2jk ω 2jk ω j ω 2jk ω j ω 2jk n the second step, we have simplified entries in the bottom half of the matrix using ω Two successive colums differ by a factor ω j in the j row n/2 nd ω n = 1 Notice that the top left n/2 n/2 submatrix is M n/2 (ω 2 ), as is the one o ottomrows left that And are the top n/2and rows bottom apartright differ submatrices by a factor areof almost ω kn/2 the in same the kas th M n/ ut with their jth rows multiplied through by ω j and ω j, respectively Therefore the roduct column is the vector Note that ω n/2 = 1, so they differ by a factor of 1 on odd columns, 74 and are identical on even columns

73 73 / 81 or using Warmupdivide-and-conquer Sorting Selection Input: Closest pair Anin Multiplication array this matrix-vector a FFT = (a 0, Fourier a 1, multiplication Transform, a n 1 ), DFT for FFT Algorithm becomes n a Fast power multiplication apparent of 2 when olumns are segregated intoa evens primitive and odds: nth root of unity, ω DFT Matrix Output: Multiplication, M n (ω) a Rearranged k ω jk M n(ω) if ω = 1: returncolumn a 2k 2k + 1 (s 0, s 1,, s n/2 1 ) = FFT((a 0, a 2,, a n 2 ), ω 2 ) (s 0, s 1,, s n/2 1 ) = FFT((a 1, a 3,, a n 1 ), ω 2 ) for j = 0 to n/2 1: a 0 a 1 a 2 a 3 ra 4 j = s j = + ω j sj j a n 1 a ω 2jk r j+n/2 = s j ω j s j return (r 0, r 1,, r n 1 ) Even columns ω j ω 2jk Odd columns a 0 a 2 a n 2 a 1 a 3 a n 1 = Row j j + n/2 Column 2k ω 2jk ω 2jk 2k + 1 ω j ω 2jk ω j ω 2jk n the second step, we have simplified entries in the bottom half of the matrix using ω n/2 nd ω n a 0 = 1 Notice that the top left n/2 n/2 submatrix is M n/2 (ω 2 a 1 Only two subproblems of size n/2: a 2 ), as is the a 3 one o Row j ottom left And the top and bottom right submatrices M n/2 are almost + ω the M n/2 same Multiply M as M n/ ut with their jth rows multiplied through by ω j and ω j j n/2 by A odd and A even: T(n) = 2T(n/2) + O(n), with an a n 2, respectively Therefore a n 1 the roduct O(nis logthe n) solution vector a 0 a But wait! M 1 n has O(n 2 ) size, how can j + n/2 a 74 2 a 3 we operate on it in O(n) time? M n/2 ω M n/2 j a n 2 a n 1

74 74 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Output: FFT Algorithm Input: An array a = (a 0, a 1,, a n 1 ), for n a power of 2 A primitive nth root of unity, ω M n (ω) a Figure 29 The fast Fourier transform if ω = 1: return a function FFT(a, ω) (s 0, s 1,, s Input: An array a = (a n/2 1 ) = FFT((a 0, a 2,, a n 2 ), ω 2 ) 0, a 1,, a n 1 ), for n a power of 2 (s 0 A primitive, s 1,, s nth n/2 1 root ) = FFT((a 1, a 3,, a n 1 ), ω 2 ) of unity, ω for j = 0 to n/2 1: Output: M n (ω) a r j = s j + ω j s j if ω = 1: return r j+n/2 a = s j ω j s j (s 0, s 1,, s return n/2 1 ) = FFT((a (r 0, ra 1, 2, ra n 1 ) n 2 ), ω 2 ) (s 0, s 1,, s n/2 1 ) = FFT((a 1, a 3,, a n 1 ), ω 2 ) for j = 0 to n/2 1: r j = s j + ω j s j r j+n/2 = s j ω j s j a 0 return (r 0, r 1,, r n 1 ) Row j M n/2 a 2 a n 2 + ω j M n/2 a 1 a 3 a n 1 Row j a 0 j + n/2 a 2 M n/2 a n 2 M n/2 + ω j a 0 a 2 M n/2 a n 2 a 1 aω 3 j a n 1 M n/2 a 1 a 3 a n 1

75 75 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Convolution in the Discrete World ( A n m 1 B m ) t = a t x b x cf (a b)(t) = x=0 a(t x)b(x)dx Linear convolution: a t x = 0 if x > t Circular convolution: a t x = a t x+n if x > n (Equivalent to treating A as a periodic function) Zero-extended convolution: First extend A and B to have m + n 1 samples by letting A y = 0 for m y < m + n and B z = 0 for n z < m + n With zero-extension, the definitions of linear and ciucular conventions match, and hence become eqivalent Hence, we will deal only with zero-extended convolution Theorem (Discrete Convolution F( A n B m ) = F( A n )F( B m ))

76 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Why this fascination with convolution? Computationally, convolution is a loop to add products The convolution theorem says we can replace this O(n) loop by a single operation on the DFT That is fascinating! Wait a minute! What about the cost of computing F first? If we use FFT, then we the computation of F and its inversion will still obe O(n log n), not quadratic Can we use FFT as a building block to speed up algorithms for other problems? Integer multiplication looks like a convolution, and usually takes O(n 2 ) Can we make it O(n log n)? 76 / 81

77 77 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication FFT to the Rescue! Matrix form of DFT and interpretation as polynomial evaluation: s 0 s 1 s j s n ω ω 2 ω n 1 = 1 ω j ω 2j ω j(n 1) 1 ω n 1 ω 2(n 1) ω (n 1)(n 1) a 0 a 1 a j a n 1 Voila! FFT computes A(x) at n points (x i = ω i ) in O(n log n) time! O(n log n) integer multiplication Convert to point representation using FFT O(n log n) Multiply on point representation O(n) Convert back to coefficients using FFT 1 O(n log n)

78 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication FFT to the Rescue! s 0 s 1 s j s n 1 = ω ω 2 ω n 1 1 ω j ω 2j ω j(n 1) 1 ω n 1 ω 2(n 1) ω (n 1)(n 1) a 0 a 1 a j a n 1 FFT can be thought of as a clever way to choose points: Evaluations at many distinct points collapse together This is why we are left with 2T(n/2) work after division, instead of 4T(n/2) for a naive choice of points 78 / 81

79 79 / 81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication FFT-based Multiplication: Summary FFT works with 2 k points Increases work by up to 2x Product of two n th -degree polynomial has degree 2n We need to work with 2n points, ie, 4x increase in time Requires inverting to coefficient representation after multiplication: Sn = M n (ω) A n Mn 1 (ω) S n = Mn 1 (ω)m n (ω) A n = A n It is easy to show that Mn 1 (ω) = M n ( ω)/n, and hence: An B n = FFT(FFT( A 2n, ω) FFT( B 2n, ω), ω 1 )/n We are back to the convolution theorem!

80 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication 80 / 81 More careful analysis Computations on complex or real numbers can lose precision For integer operations, we should work in some other ring usually, we choose a ring based on modulo arithmetic Ex: in mod 33 arithmetic, 2 is the 10 th root of 1, ie, mod 33 More generally, 2 is the nth root of unity modulo (2 n/2 + 1) Point-wise additions and multiplications are not O(1) We are adding up to n numbers ( digits ) we need Ω(log n) bits So, total cost increases by at least log n, ie, O(n log 2 n) [Schonhage-Strassen 71] developed O(n log n log log n) algorithm: recursively apply their technique for inner operations

81 Warmup Sorting Selection Closest pair Multiplication FFT Fourier Transform DFT FFT Algorithm Fast multiplication Integer Multiplication Summary Algorithms implemented in libraries for arbitrary precision arithmetic, with applications in public key cryptography, computer algebra systems, etc GNU MP is a popular library, uses various algorithms based on input size: naive, Karatsuba, Toom-3, Toom-4, or Schonhage-Strassen (at about 50K digits) Karatsuba is Toom-2 Toom-N is based on Evaluating a polynomial at 2N points, performing point-wise multiplication, and interpolating to get back the polynomial, while minimizing the operations needed for interpolation 81 / 81

Divide-and-Conquer: A versatile strategy. CSE 548: (Design and) Analysis of Algorithms. Topics. Binary Search. Divide-and-conquer Algorithms. R.

Warmup Sorting Selection Closest pair Multiplication FFT Warmup Sorting Selection Closest pair Multiplication FFT Overview Search H-Tree Exponentiation Divide-and-Conquer: A versatile strategy CSE 548: