Intensive Algorithms Lecture 11. DFT and DP. Lecturer: Daniel A. Spielman February 20, f(n) O(g(n) log c g(n)).

Itesive Algorithms Lecture 11 DFT ad DP Lecturer: Daiel A. Spielma February 20, 2018 11.1 Itroductio The purpose of this lecture is to lear how use the Discrete Fourier Trasform to save space i Dyamic Programmig. If there is time at the ed, I will show a fast algorithm for computig the matrix permaet, ad talk a little about Gray codes. The mai result comes from the paper Savig Space by Algebraizatio by Lokshtaof ad Nederlof [LN10]. I do ot recommed the paper, as it solves a more abstract problem ad speds a lot of time o umerical aalysis, which we will igore. Sice this algorithm will ivolve may multiplicatios, I will state some improvemets o the multiplicatio algorithm we ecoutered last week. A classic result of Schöhage ad Strasse [SS71] tells us that we ca compute the product of two B-bit itegers i time O(B log B log log B). A few years ago Fürer [Für09] improved this to time O(B(log B)K log B ), for some costat K. This later result is slightly faster. But, the differece will ot matter to us i this lecture. I fact, the precise expressios for the complexities of the algorithms we will ecouter this lecture will be very difficult to state. For this reaso, we itroduce the otatio Õ. We say that f() Õ(g()) if there is some costat c for which f() O(g() log c g()). This eables us skip writig low order logarithmic terms. I this otatio, we say that we ca multiply two B-bit itegers i time Õ(B). Similarly, we ca multiply two umbers to B bits of precisio i time Õ(B). We will also require algorithms for fast expoetiatio. For a iteger a, we ca compute x a while usig oly O(log a) arithmetic operatios: we ca compute x 2i for various i by repeated squarig, ad the multiplyig together the appropriate terms. For example, x 11 = x 1 x 2 x 8. I geeral, we expad a i biary as b 0 + 2b 1 + 4b 2 + + 2 k b k, the compute by the loop x a = j:b j =1 x 2b j 11-1

Lecture 11: February 20, 2018 11-2 1. z = x. y = 1. 2. For j i 1 to k a. If b j = 1, y = y z. b. z = z 2. 3. Retur y. 11.2 The DFT ad iverse DFT We begi by recallig the Discrete Fourier Trasform from last lecture, ad the derive its iverse. Let ω = e 2πi/. The DFT is a liear trasformatio that maps a vector (a 0, a 1,..., a ) to a vector (y 0,..., y ) by settig ( y k = a j ω k) j = a j ω jk. That is, it treats its iput vector as the coefficiets of a polyomial ad the sets for all k. j=0 j=0 p(x) = a j x j, j=0 y k = p(ω k ) It is sometimes useful to expres the DFT as multiplicatio by a matrix. We may evaluate the polyomial p at a poit x 0 by formig the vector of powers of x 0, ad the takig its ier product with the vector of coefficiets of p: a 0 a 1 p(x 0 ) = ( 1, x 0, x 2 0,..., x 0 ). a. If we assemble all of those row vectors ito a matrix, we obtai 1 1 1 1 1 a 0 1 ω ω 2 ω 3 ω a 1 1 ω 2 ω 4 ω 6 ω 2 2 a 2 1 ω 3 ω 6 ω 9 ω 3 3 a 3 =....... 1 ω ω 2 2 ω 3 3 ω a y 0 y 1 y 2 y 3. y.

Lecture 11: February 20, 2018 11-3 Last lecture, we leared a algorithm called the FFT for computig the DFT i time O( log ) whe is a power of 2. Today s lecture will ot actually require the FFT. But, we will eed the iverse of the DFT, which is what we use to iterpolate the coefficiets of a polyomial from its values. Lemma 1. Let y 0,..., y be the DFT of a 0,..., a. The, Proof. By expadig y k, we fid ω jk y k = ω jk a j = 1 ω jk y k. ω hk a h = h=0 h=0 ω hk ω jk = a h h=0 a h ω k(h j) We ow prove that ω k(h j) = { if j = h 0 otherwise. Whe h = j, all of the terms are 1 ad the sum is. For h j, (h j) is ot divisible by ad so ω (h j) 1. There are the two easy way to see that the sum is 0. The algebraic way is to sum the geometric series, to get 1 (ω (h j) ) 1 ω h j = 1 (ω ) h j 1 ω h j = 1 1 = 0. 1 ωh j The geometric way is to observe that this is a sum of regularly spaced poits aroud the uit circle, ad thus evaluates to 0. Thus, 1 ω jk y k = 1 h=0 a h { if j = h 0 otherwise = a j. There are two thigs that you should take away from this. The first is that we ca obtai the coefficiet a j by computig the ier product of the DFT of p(x) ad the vector with etries ω jk. The secod is that the computatio that iverts the DFT is almost idetical to the oe that computes it.

Lecture 11: February 20, 2018 11-4 11.3 Space ad Subset Sum We ow recall that i the subset sum problem we are give as iput a list of positive itegers w 1,..., w alog with a target iteger T, ad eed to determie if there a S {1,..., } such that w i = T. i S We may assume without loss of geerality that all w i T, as we ca safely discard ay w i that are larger. We ca solve this problem by Dyamic Programmig i time O(T ) by buildig a table of achievable sums. For example, we could set opt(j, t) to be true if there is a S {1,..., j} for which i S w i = t. We ca the compute all the etires i this table by the recurrece opt(j, t) = opt(j 1, t) or opt(j 1, t w j ). I fact, we ca make this table a little smaller by droppig the first coordiate. followig routie: Cosider the 1. For t i 1 to, set opt(t) = false. Set opt(0) = true. 2. For i i 1 to a. For t i 1 to T, if a i t set opt(t) = opt(t) or opt(t a i ). You ca show that after the ith iteratio, opt(t) = opt(i, t) for all t. We could eve use a loop like this to cout the umber of solutios. Cosider istead 1. For t i 1 to, set c(t) = 0. Set c(0) = true. 2. For i i 1 to a. For t i 1 to T, if a i t set c(t) = c(t) + c(t a i ). Oe ca prove by iductio that after the ith iteratio, c(t) equals the umber of subsets of {1,..., i} that give the sum t. We would ow like to see how to solve this problem by usig less space. The importat property of space / memory that we are goig to exploit is that it ca be reused. To get us used to this idea, I will show that we ca solve this problem usig O( + log T ) bits, but O(2 log T ) time. This is the space ad time required to go through every possible subset (there are 2 ), compute the sum for that subset (i time O( log T )), ad check if it equals T. We eed bits to keep track of the curret subset, ad O(log T ) bits to write dow the sum for that subset. It might seem that we could eed as may as log(t ) bits, but we ca avoid this by stoppig if the sum exceeds T. This algorithm is reasoable if T 2 ; but is slow if T << 2. We wat a algorithm that uses space o this order, but which is ot too much slower tha the dyamic programmig algorithm.

Lecture 11: February 20, 2018 11-5 11.4 Subset Sum by Polyomials Cosider the polyomial p(x) = (1 + x a i ). The coefficiet of x t i this polyomial equals the umber of subsets that give sum t. We called this quatity c(t) a few sectios ago. Our pla is to use the DFT to compute the coefficiet of x T i low space. I additio to determiig if there is a subset that achieves sum T, we will fid out how may there are that do. We will, of course, do this usig a Discrete Fourier Trasform. For each 1 i, let p i (x) = 1+x a i. Note that we ca compute p i (x) i at most O(log a i ) operatios: O(log a i ) multiplicatios to compute x a i, ad we the add 1. The followig is a descriptio of the algorithm for computig c(t ). We will write it i a way that makes it easy to reaso about its space usage. 1. Set N = T + 1, ad ω = e 2πi/N. 2. Set c = 0. 3. For j i 0 to T 1, a. Set z = 1 ad x = ω j. b. For k i 1 to, z = z p k (x). c. Set c = c + z ω jt 4. Retur c. The umber c retured at the ed of the algorithm equals c(t ) = 1 N N 1 j=0 ad thus is the coefficiet of x T i the polyomial p. ω jt p(ω j ), I assert, ad do ot have time or space to prove, that it suffices to perform all the calculatios with O( + log T ) bits of accuracy. At the ed of this sectio, I will explai how we could do it modulo a prime. The algorithm keeps all of its storage i 4 variables: j, k, z ad c. So, its total space usage is O(+log T ). The mai cost of the algorithm i time is the computatio of p k (x) ad multiplicatio by z. As each computatio of p k (x) requires O(log T ) operatios ad is carried out to O( + log T ) bits of precisio, it ca be performed i time Õ( log T + log2 T ). As there are N O( 2 T ) such computatios i the algorithm, the total time is Õ(3 T ).

Lecture 11: February 20, 2018 11-6 11.4.1 Modulo a prime Istead of worryig about umerics, we could perform these computatios modulo a prime. The most atural way of doig this would first ivolve discoverig a prime N just a little bit larger tha T + 1. Usig radomized algorithms, we ca do this with high probability i time polyomial i log(t ). We the eed to fid a geerator of multiplicative group modulo N to use as ω. This is a elemet for which ω N 1 = 1, but ω k 1 for 0 < k < N 1. Agai, there are algorithms for doig this quickly (assumig some stadard umber theoretic cojectures are true). If we ow go through the computatio, we will compute c(t ) modulo N. This, however, does ot ecessarily allow us to determie if c(t ) is zero or ot: it could be divisible by N. The Chiese Remaider Theorem allows us to solve this problem: we just eed to repeat the process with primes N 1,..., N q whose product exceeds c(t ). 11.5 The Permaet We probably wo t make it to this material i class. By, I am writig it just i case. The determiat of a square matrix is a fudametal quatity i liear algebra. It s magitude is the volume of the parallelipiped whose axes are the colums of the matrix (remarkably, we get the same volume usig the rows). Oe formula for the determiat is give by a sum over all permutatios of : det(m) = π ( 1) sg(π) M i,π(i). Here the sig of a permutatio, sg(π), is determied by umber of traspositios eeded to costruct π: it is 1 if the umber is eve ad 1 if the umber is odd. As there are! permutatios, this is a slow way to compute the determiat. It ca be computed i polyomial time usig stadard tools from liear algebra. I fact, it ca be computed i time O( ω ), the time eeded to multiply matrices. The determiat looks very similar to the permaet of a matrix M, which has the same formula but without the sigs: perm(m) = M i,π(i). π For example, a 1 a 2 a 3 perm b 1 b 2 b 3 = a 1 b 2 c 3 + a 2 b 1 c 3 + a 1 b 3 c 2 + a 3 b 1 c 2 + a 2 b 3 c 1 + a 3 b 2 c 1. c 3 c 2 c 3 We believe that it is ot possible to compute the permaet of a matrix i polyomial time, eve whe all of its etries are i {0, 1}.

Lecture 11: February 20, 2018 11-7 But, we do kow how to compute it i time much less tha!. By rearragig the computatio, we ca compute it usig oly O(2 ) operatios. While these are both big umbers ad it might ot look like a big differece, it is better tha every polyomial speedup: (2 ) c O(!) for every costat c. To speed up this computatio, we derive a alterative formula for the permaet. There are two that are famous: Ryser s formula ad Gly s formula (which seems to have bee discovered log before Gly published it). Gly s formula ivolves a sum over vectors s {±1} : perm(m) 1 2 s {±1} j=1 s j (s 1 M 1,i + s 2 M 2,i + + s M,i ). I fact, it suffices to sum over oly vectors s for which s 1 = 1, which saves a factor of 2 i the ruig time. For example, this formula gives ( ) a1 a 2perm 2 = (a b 1 b 1 + b 1 )(a 2 + b 2 ) (a 1 b 1 )(a 2 b 2 ) 2 = a 1 a 2 + a 1 b 2 + b 1 a 2 + b 1 b 2 a 1 a 2 + a 1 b 2 + b 1 a 2 b 1 b 2 = 2a 1 b 2 + 2a 2 b 1. To show that Gly s formula is correct, we cosider what happes to every moomial. The terms that ca appear i the product ca each be described by a fuctio f : {1,..., } {1,..., }: s f(i) M f(i),i. We will show that the oly terms that survive are those for which f is a permutatio. If f is ot a permutatio, the there is some k which is ot i the image of f. This meas that the correspodig moomial does ot deped o s f(k), ad thus will appear a equal umber of times with positive ad egative sig. For those fuctios f that are permutatios, each s f(i) appears ad we ca write: s f(i) M f(i),i = M f(i),i. s f(i) The product of the sigs is cacelled by the product of the sigs at the from of Gly s formula. A aive computatio of Gly s formula would take time O( 2 2 ), as the product of the siged row-sums looks like it requires 2 operatios. However, oe ca accelerate this by beig careful about the order i which we traverse the vectors s. The best order is give by a Gray code. It chages oe coordiate i s per step. There are very efficiet algorithms for eumeratig the vectors i a Gray code (ad i fact for eumeratig everythig else atural). Thus, we ca keep track for of the siged sums for each colum with a costat cost for each, for a total of O() per product. Refereces [Für09] Marti Fürer. Faster iteger multiplicatio. SIAM Joural o Computig, 39(3):979 1005, 2009.

Lecture 11: February 20, 2018 11-8 [LN10] Daiel Lokshtaov ad Jesper Nederlof. Savig space by algebraizatio. I Proceedigs of the forty-secod ACM symposium o Theory of computig, pages 321 330. ACM, 2010. [SS71] Arold Schöhage ad Volker Strasse. Schelle multiplikatio grosser zahle. Computig, 7(3-4):281 292, 1971.