The divide-and-conquer strategy solves a problem by: 1. Breaking it into subproblems that are themselves smaller instances of the same type of problem

Chapter 2. Divide-and-conquer algorithms

The divide-and-conquer strategy solves a problem by: 1. Breaking it into subproblems that are themselves smaller instances of the same type of problem. 2. Recursively solving these subproblems. 3. Appropriately combining their answers.

Multiplication

Johann Carl Friedrich Gauss 1777-1855 1+2+ +100= 100 (1 + 100) 2 =5050.

Gauss once noticed that although the product of two complex numbers (a + bi)(c + di) =ac bd +(bc + ad)i seems to involve four real-number multiplications, it can in fact be done with just three: ac, bd, and(a + b)(c + d), since bc + ad =(a + b)(c + d) ac bd. In our big-o way of thinking, reducing the number of multiplications from four to three seems wasted ingenuity. But this modest improvement becomes very significant when applied recursively.

Suppose x and y are two n-bit integers, and assume for convenience that n is a power of 2. Lemma For every n there exists an n 0 with n apple n 0 apple 2n such that n 0 apowerof2. As a first step toward multiplying x and y, wespliteachofthemintotheirleft and right halves, which are n/2 bits long: x = x L x R =2 n/2 x L + x R y = y L y R =2 n/2 y L + y R. xy =(2 n/2 x L + x R )(2 n/2 y L + y R )=2 n x L y L +2 n/2 (x L y R + x R y L )+x R y R. The additions take linear time, as do the multiplications by powers of 2. The significant operations are the four n/2-bit multiplications; these we can handle by four recursive calls.

Our method for multiplying n-bit numbers starts by making recursive calls to multiply these four pairs of n/2-bit numbers, and then evaluates the preceding expression in O(n) time. Writing T (n) for the overall running time on n-bit inputs, we get the recurrence relation: T (n) =4T (n/2) + O(n) Solution: O(n 2 ). By Gauss s trick, three multiplications, x L y L, x R y R,and(x L + x R )(y L + y R ), su ce, as x L y R + x R y L =(x L + x R )(y L + y R ) x L y L x R y R.

A divide-and-conquer algorithm for integer multiplication multiply(x, y) // Input: positive integers x and y, in binary // Output: their product 1. n =max(sizeofx, sizeofy) roundedasapowerof2. 2. if n =1then return xy. 3. x L, x R = leftmost n/2, rightmost n/2 bitsofx 4. y L, y R = leftmost n/2, rightmost n/2 bitsofy 5. P 1 = multiply(x L, y L ) 6. P 2 = multiply(x R, y R ) 7. P 3 = multiply(x L + x R, y L + y R ) 8. return P 1 2 n +(P 3 P 1 P 2) 2 n/2 + P 2.

The time analysis The recurrence relation: T (n) =3T (n/2) + O(n) I I I The algorithm s recursive calls form a tree structure. At each successive level of recursion the subproblems get halved in size. At the (log 2 n) th level, the subproblems get down to size 1, and so the recursion ends. I The height of the tree is log 2 n. I The branching factor is 3: each problem recursively produces three smaller ones, with the result that at depth k in the tree there are 3 k subproblems, eachofsize n/2 k. For each subproblem, a linear amount of work is done in identifying further subproblems and combining their answers. Therefore the total time spent at depth k in the tree is 3 k n O = 2 k k 3 O(n). 2

The time analysis (cont d) 3 k n O = 2 k k 3 O(n). 2 At the very top level, when k =0,weneedO(n). At the bottom, when k =log 2 n,itis O 3 log n 2 = O n log 3 2 Between these two endpoints, the work done increases geometrically from O(n) to O(n log 2 3 ), by a factor of 3/2 perlevel. The sum of any increasing geometric series is, within a constant factor, simply the last term of the series Therefore the overall running time is We can do even better! O(n log 2 3 ) O(n 1.59 ).

Recurrence relations

Master theorem If T (n) =at (dn/be)+o(n d ) for some constants a > 0, b > 1, and d 0, then 8 >< O(n d ) if d > log b a T (n) = O(n d log n) if d =log b a >: O(n log b a ) if d < log b a.

Proof of Master Theorem Assume that n is a power of b. This will not influence the final bound in any important way: n is at most a multiplicative factor of b away from some power of b. Next, notice that the size of the subproblems decreases by a factor of b with each level of recursion, and therefore reaches the base case after log b n levels. This is the height of the recursion tree. The branching factor of the recursion tree is a, sothekth level of the tree is made up of a k subproblems, eachofsize n = b k. The total work done at this level is a k O n b k d = O(n d ) a b d k.

Proof of Master Theorem The total work done is log n b X k=0 a k n d O = O(n d a k ). b k b d It s the sum of a geometric series with ratio a/b d. 1. The ratio is less than 1. Then the series is decreasing, and its sum is just given by its first term, O(n d ). 2. The ratio is greater than 1. The series is increasing and its sum is given by its last term, O(n log b a ): n d a logb n = n d a log b n = a log b n = a (log a n)(log b a) = n log b a. b d (b log b n ) d 3. The ratio is exactly 1. In this case all O(log n) termsoftheseriesare equal to O(n d ).

Merge sort

The algorithm mergesort(a[1...n]) // Input: an array of numbers a[1...n] // Output: A sorted version of this array 1. if n > 1 then 2. return merge(mergesort(a[1...bn/2c]), 3. mergesort(a[bn/2c +1...n]), 4. else return a. merge(x[1...k], y[1...`]) // Input: two sorted arrays x and y // Output: A sorted version of the union of x and y 1. if k =0then return y[1...`] 2. if ` =0then return x[1...k] 3. if x[1] apple y[1] 4. then return x[1] merge(x[2...k], y[1...`]) 5. else return y[1] merge(x[1...k], y[2...`]).

The time analysis The recurrence relation: T (n) =2T (n/2) + O(n); By Master Theorem T (n) =O(n log n).

An n log n lower bound for sorting Sorting algorithms can be depicted as trees. The depth of the tree the number of comparisons on the longest path from root to leaf, is exactly the worst-case time complexity of the algorithm. Consider any such tree that sorts an array of n elements. Each of its leaves is labeled by a permutation of {1, 2,...,n}. every permutation must appear as the label of a leaf. This is a binary tree with n! leaves. So, the depth of our tree and the complexity of our algorithm must be at least log(n!) log p (2n +1/3) nn e n = (n log n), where we use Stirling s formula.

Median

Median The median of a list of numbers is its 50th percentile: half the numbers are bigger than it, and half are smaller. If the list has even length, there are two choices for what the middle element could be, in which case we pick the smaller of the two, say. The purpose of the median is to summarize a set of numbers by a single, typical value. Computing the median of n numbers is easy: just sort them. The drawback is that this takes O(n log n) time,whereaswewouldideallylikesomethinglinear. We have reason to be hopeful, because sorting is doing far more work than we really need we just want the middle element and don t care about the relative ordering of the rest of them.

Selection Input: AlistofnumbersS; anintegerk. Output: Thekth smallest element of S.

A randomized divide-and-conquer algorithm for selection For any number v, imaginesplittinglists into three categories: I elements smaller than v, i.e.,s L ; I I those equal to v, i.e.,s v (there might be duplicates); and those greater than v, i.e.,s R respectively. 8 >< selection(s L, k) if k apple S L selection(s, k) = v if S L < k apple S L + S v >: selection(s R, k S L S v ) if k > S L + S v.

How to choose v? It should be picked quickly, and it should shrink the array substantially, the ideal situation being S L, S R S 2. If we could always guarantee this situation, we would get a running time of T (n) =T (n/2) + O(n) =O(n). But this requires picking v to be the median, which is our ultimate goal! Instead, we follow a much simpler alternative: we pick v randomly from S.

How to choose v? (cont d) Worst-case scenario would force our selection algorithm to perform Best-case scenario: O(n). n +(n 1) + (n 2) + + n 2 = (n2 ) Where, in this spectrum from O(n) to (n 2 ), does the average running time lie? Fortunately, it lies very close to the best-case time.

The e ciency analysis v is good if it lies within the 25th to 75th percentile of the array that it is chosen from. Arandomlychosenvhasa50% chance of being good, Lemma On average a fair coin needs to be tossed two times before a heads is seen. Proof. E := expected number of tosses before head is seen. We need at least one toss, and it s heads, we re done. If it s tail (with probability 1/2), we need to repeat. Hence whose solution is E =2 E =1+ 1 2 E,

The e ciency analysis (cont d) Let T (n) betheexpected running time on an array of size n, weget T (n) apple T (3n/4) + O(n) =O(n).

Matrix multiplication

The product of two n n matrices X and Y is a third n n matrix Z = XY, with (i, j)th entry nx Z ij = X ik Y kj k=1 That is, Z ij is the dot product of the ith row of X with the jth column of Y. In general, XY is not the same as YX ; matrix multiplication is not commutative. The preceding formula implies an O(n 3 ) algorithm for matrix multiplication.

Volker Strassen (1936 ) In 1969, the German mathematician Volker Strassen announced a surprising O(n 2.81 ) algorithm.

Divide and conquer apple apple A B E X = C D, Y = G F H Then apple A XY = C B D apple E F apple AE + BG G H = CE + DG AF + BH CF + DH To compute the size-n product XY,recursively compute eight size-n/2 products AE, BG, AF, BH, CE, DG, CF, DH and then do some O(n 2 )-time addition. The recurrence is with solution O(n 3 ). T (n) =8T (n/2) + O(n 2 )

Strassen s trick where apple P5 + P 4 P 2 + P 6 P 1 + P 2 XY = P 3 + P 4 P 1 + P 5 = P 3 P 7 P 1 = A(F H) P 5 = (A + D)(E + H) P 2 = (A + B)H P 6 = (B D)(G + H) P 3 = (C + D)E P 7 = (A C)(E + F ) P 4 = D(G E) The recurrence is with solution O(n log 2 7 ) O(n 2.81 ). T (n) =7T (n/2) + O(n 2 )

Faster Polynomial Multiplication

Let A and B be polynomials of degree at most n: A(x) = B(x) = Wish to calculate n a j x j for some a 0,..., a n R, j=0 n b j x j for some b 0,..., b n R. j=0 Necessarily, deg(c) = 2n and so C(x) = for some c 0,..., c 2n R. C(x) = A(x)B(x) 2n j=0 c j x j

Problem: Given coefficients of A and B, compute coeffs of C.

Problem: Given coefficients of A and B, compute coeffs of C. Easy to see that c j = j a i b j i (0 j 2n) i=0 This suggests the naive algorithm : for (int j {0,..., 2n}) do c j = 0 for (int i {0,..., j}) do c j += a i b j 1 Use the real-number model of computation: arithmetic operation on real numbers has unit cost. Cost of naive algorithm?

Without loss of generality, assume n a power of two. Write A(x) = A 1 (x)x n/2 + A 2 (x) B(x) = B 1 (x)x n/2 + B 2 (x) where deg A 1 = deg A 2 = deg B 1 = deg B 2 = n/2. Then C(x) = A 1 (x)b 1 (x)x n + ( A 2 (x)b 1 (x)+a 1 (x)b 2 (x) ) x n/2 +A 2 (x)b 2 (x) Use Gauss trick: P 1 (x) = A 1 (x)b 1 (x) P 2 (x) = A 2 (x)b 2 (x) P 3 (x) = ( A 1 (x) + A 2 (x) )( B 1 (x) + B 2 (x) ) S(x) = P 3 (x) P 1 (x) P 2 (x) C(x) = P 1 (x)x n + S(x)x n/2 + P 2 (x)

We then find T (n) = 3T (n/2) + Θ(n) By the Master Theorem, we have T (n) = Θ(n log 2 3 ) =. Θ(n 1.59 ) Can we do better?

We then find T (n) = 3T (n/2) + Θ(n) By the Master Theorem, we have T (n) = Θ(n log 2 3 ) =. Θ(n 1.59 ) Can we do better? Yes!

If we follow the approach that the Toom-Cook algorithms uses in multiplying n-bit numbers, we can show ε > 0, we can multiply two polnomials of degree n using Θ(n 1+ε ) arithmetic operations.

If we follow the approach that the Toom-Cook algorithms uses in multiplying n-bit numbers, we can show ε > 0, we can multiply two polnomials of degree n using Θ(n 1+ε ) arithmetic operations. However, the Θ-constnat blows up as ε 0. Can we do better?

The fast Fourier transform

Polynomial multiplication If A(x) =a 0 + a 1x + + a d x d and B(x) =b 0 + b 1x + + b d x d, their product has coe cients C(x) =A(x) B(x) =c 0 + c 1x + + c 2d x 2d c k = a 0b k + a 1b k 1 + + a k b 0 = where for i > d, takea i and b i to be zero. kx a i b k Computing c k from this formula takes O(k) steps, and finding all 2d +1 coe cients would therefore seem to require (d 2 ) time. i=0 i

An alternative representation of polynomials Fact Adegree-dpolynomialisuniquelycharacterizedbyitsvaluesatanyd+1 distinct points. E.g., any two points determine a line. We can specify a degree-d polynomial A(x) =a 0 + a 1x + + a d x d by either one of the following: 1. Its coe cients a 0, a 1,...,a d.(coe cient representation) 2. The values A(x 0), A(x 1),...,A(x d ). (value representation)

An alternative representation of polynomials (cont d) coe cient representation evaluation! value representation interpolation - The product C(x) hasdegree2d, itiscompletelydeterminedbyitsvalue at any 2d +1points. - The value of C(z) atanygivenpointz is just A(z) timesb(z). Thus polynomial multiplication takes linear time in the value representation.

The algorithm Input: Coe cients of two polynomials, A(x) andb(x), of degree d Output: Their product C = A B Selection Pick some points x 0, x 1,...,x n 1, wheren 2d +1 Evaluation Compute A(x 0), A(x 1),...,A(x n 1) andb(x 0), B(x 1),...,B(x n 1) Multiplication Compute C(x k )=A(x k )B(x k ) for all k =0,...,n 1 Interpolation Recover C(x) =c 0 + c 1x + + c 2d x 2d

The selection step and the n multiplications are just linear time: In a typical setting for polynomial multiplication, the coe cients of the polynomials are real numbers and, moreover, are small enough that basic arithmetic operations (adding and multiplying) take unit time. Evaluating a polynomial of degree d apple n at a single point takes O(n), and so the baseline for n points is (n 2 ). The fast Fourier transform (FFT) does it in just O(n log n) time, for a particularly clever choice of x 0,...,x n 1 in which the computations required by the individual points overlap with one another and can be shared.

Evaluation by divide-and-conquer We pick the n points: ±x 0, ±x 1,...,±x n/2 1, then the computations required for each A(x i )anda( x i )overlapalot, because the even powers of x i coincide with those of x i. To investigate this, we need to split A(x) into its odd and even powers, for instance 3+4x +6x 2 +2x 3 + x 4 +10x 5 = (3 + 6x 2 + x 4 ) + x(4 + 2x 2 +10x 4 ). More generally A(x) =A e(x 2 ) + xa o(x 2 ), where A e( ), with the even-numbered coe cients, and A o( ), with the odd-numbered coe cients, are polynomials of degree apple n/2 1 (assume for convenience that n is even).

Evaluation by divide-and-conquer (cont d) Given paired points ±x i, the calculations needed for A(x i )canberecycled toward computing A( x i ): A(x i )=A e(x 2 i )+x i A o(x 2 i ) A( x i )=A e(x 2 i ) x i A o(x 2 i ). In other words, evaluating A(x) atn paired points ±x 0,...,±x n/2 1 reduces to evaluating A e(x) anda o(x) (whicheachhavehalfthedegreeofa(x)) at just n/2 points,x0 2,...,x 2 n/2 1. If we could recurse, wewouldgetadivide-and-conquerprocedurewithrunning time T (n) =2T (n/2) + O(n) =O(n log n).

How to choose n points? We have a problem: To recurse at the next level, we need the n/2 evaluation points x0 2, x1 2,...,x 2 n/2 1 to be themselves plus-minus pairs. How can a square be negative? We use complex numbers. At the very bottom of the recursion, we have a single point. This point might as well be 1, in which case the level above it must consist of its square roots, ± p 1=±1. The next level up then has ± p +1 = ±1 aswellasthecomplexnumbers ± p 1=±i, wherei is the imaginary unit. By continuing in this manner, we eventually reach the initial set of n points: the complex nth roots of unity, thatis,then complex solutions to the equation z n =1.

The complex roots of unity The nth roots of unity: the complex numbers where If n is even, then: 1,!,! 2,...,! n 1,! = e 2 i/n. 1. The nth roots are plus-minus paired,! n/2+j =! j. 2. Squaring them produces the (n/2)nd roots of unity. Therefore, if we start with these numbers for some n that is a power of 2, then at successive levels of recursion we will have the (n/2 k )th roots of unity, for k =0, 1, 2, 3,... All these sets of numbers are plus-minus paired, and so our divide-and-conquer, works perfectly. The resulting algorithm is the fast Fourier transform.

The fast Fourier transform (polynomial formulation) FFT(A,!) Input: coe cient representation of a polynomial A(x) of degree apple n 1, where n is a power of 2;!, annth root of unity Output: value representation A(! 0 ),...,A(! n 1 ) 1. if! =1then return A(1) 2. express A(x) in the form A e(x 2 )+xa o(x 2 ) 3. call FFT(A e,! 2 )toevaluatea e at even powers of! 4. call FFT(A o,! 2 )toevaluatea o at even powers of! 5. for j =0to n 1 do 6. compute A(! j )=A e(! 2j )+! j A o(! 2j ) 7. return A(! 0 ),...,A(! n 1 )

Interpolation We designed the FFT, a way to move from coe cients to values in time just O(n log n), when the points {x i } are complex nth roots of unity That is 1,!,! 2,...,! n 1. hvaluei = FFT hcoe cientsi,!. We will see that the interpolation can be computed by hcoe cientsi = 1 FFT hvaluesi,!. n

A matrix reformulation Let s explicitly set down the relationship between our two representations for a polynomial A(x) ofdegreeapple n 1. 2 6 4 A(x 0) A(x 1). A(x n 1) 3 2 1 x 0 x0 2... x n 1 32 0 7 5 = 1 x 1 x1 2... x n 1 1 6 76 4. 54 1 x n 1 xn 2 1... x n 1 n 1 a 0 a 1. a n 1 Let M be the matrix in the middle, which is a Vandermonde matrix: If x 0,...,x n 1 are distinct numbers, then M is invertible. Hence Evaluation is multiplication by M, while interpolation is multiplication by M 1. 3 7 5

Interpolation resolved In linear algebra terms, the FFT multiplies an arbitrary n-dimensional vector which we have been calling the coe cient representation bythen n matrix 2 3 1 1 1... 1 1!! 2...! n 1. M n(!) = 1! j! 2j (n 1)j....! 6 4 7. 5 1! n 1! 2(n 1) (n 1)(n 1)...! Its (j, k)th entry (starting row- and column-count at zero) is! jk. We will show: Inversion formula M n(!) 1 = 1 n Mn! 1.

Interpolation resolved (cont d) Take! to be e 2 i/n and think of the columns of M as vectors in C n. Recall that the angle between two vectors u =(u 0,...,u n 1) andv =(v 0,...,v n 1) in C n is just a scaling factor times their inner product u v = u 0v 0 + u 1v 1 + + u n 1v n 1, where z denotes the complex conjugate of z. Recall the complex conjugate of a complex number z = re i is z = re i. The complex conjugate of a vector (or matrix) is obtained by taking the complex conjugates of all its entries. The above quantity is maximized when the vectors lie in the same direction and is zero when the vectors are orthogonal to each other.

Interpolation resolved (cont d) Lemma The columns of matrix M are orthogonal to each other. Proof. Take the inner product of any columns j and k of matrix M, 1+! j k +! 2(j k) + +! (n 1)(j k). This is a geometric series with first term 1, last term! (n 1)(j k),andratio! j k. Therefore, if j 6= k, thenitevaluatesto If j = k, thenitevaluateston n(j k) 1! =0 1! (j k) Corollary MM = ni, i.e., M n(!) 1 = 1 n Mn(! 1 ).

The fast Fourier transform (general formulation) FFT(a,!) Input: An array a =(a 0, a 1,...,a n 1) for n apowerof2 Aprimitiventh root of unity,! Output: M n(!) 1. if! =1then return a 2. (s 0, s 1,...,s n/2 1 )=FFT((a 0, a 2,...,a n 2),! 2 ) 3. (s0, 0 s1,...,s 0 n 0 2) =FFT((a 1, a 3,...,a n 1),! 2 ) 4. for j =0to n/2 1 do 5. r j = s j +! j sj 0 6. r j+n/2 = s j! j sj 0 7. return (r 0, r 1,...,r n 1)