Find the Missing Number or Numbers: An Exposition

Find the Missing Number or Numbers: An Exposition William Gasarch Univ. of MD at College Park Introduction The following is a classic problem in streaming algorithms and often the first one taught. Assume n is large and k is constant. Alice is going to say all but k of the numbers in the set {,,..., n} in some order. Bob will listen and try to discern what the k missing numbers are. If Bob s brain could easily store and access n bits then he would be able to store a bit vector and mark each number as it came in, then scan the bit vector for the k missing numbers. But what if Bob s brain can only store m n bits? This can be presented as a fun math puzzle, and for k = and even k = the answer is fun. Is it fun for k = 3? k 4? I leave that as an exercise for the reader. We present solutions for k =, k =, k = 3 and k 4. Find the Missing Number Alice is going to say all but one of the numbers in the set {,,..., n} in some order. Bob will listen and try to discern what the missing number is. Alice says the numbers x, x,..., x n. They are all distinct elements from {,..., n} but one is missing. Let y be the missing number. University of Maryland, College Park, MD 074, gasarch@cs.umd.edu

Bob can do this problem storing just O(log n) bits. As Bob hears the numbers he maintains the SUM. This takes just O(log n) bits. At the end he has i n x i.. Solution Using the Sum Note that x i = ( i) y = i n i n n(n + ) y. Bob finds the missing number is y = n(n+) i n x i. Note. n(n+) has size lg n, hence this algorithm takes space lg n. Can we do better? Yes! Realize that the final answer is between and n. Hence if we did all calculations mod n we would get the same answer (equating 0 with n). If n is odd then n(n+) 0 (mod n). Hence y is n(n + ) i n ( n(n + ) x i = i n x i ) (mod n) = i n x i (mod n). Hence Bob can compute i n x i) (mod n) which takes lg n bits. He can then subtract it from n (can this by done in lg n bits?) and get the answer, only using lg n bits. If n is even then use mod n +.. Solution Using XOR An alternative solution: View the numbers x,..., x n as lg n -bit strings. After seeing the x,..., x L Bob maintains x x x L. One can show that the final string Bob has, x x n, IS the missing number.

3 Find the Missing Two Numbers Alice is going to say all but two of the numbers in the set {,,..., n} in some order. Bob will listen and try to discern which two numbers are missing. We denote the missing numbers y, y. We give three solutions that use O(log n) bits. 3. Solution that Uses the Quadratic Formula As Bob hears the numbers he maintains the SUM and SUM OF SQUARES. At the end Bob has Since Bob can calculate and store i n x i and i n x i he can deduce i n x i i n x i s t = y + y = i n x i i n i = y + y = i n x i i n i We derive y, y from s, t as follows. y = s y t = y + (s y ) = y sy + s y sy + s t = 0 Now use the quadratic formula to find y and then y = s y to find y. Note 3. The above solution takes 3 lg n + lg n = 5 lg n space since we need to store a sum of size O(n 3 ) and a sum of size O(n ). We can do better! We can t use mod n since the quadratic might have more than roots mod n. Let p be a prime such that n p n. Do all of the above mod p works. This will take lg(n) + lg(n) lg(n) + O(). 3

3. Solution that Uses Sums of Powers This solution is identical to the one in Section 3. up until we find s, t. We then find y y as follows: s t = y y. Let s = y + y and p = y y. Form the polynomial X sx + p = X (y + y )X + y y = (X y )(X y ). Find the roots of this polynomial. We call this THE POLY-ROOTS TRICK throughout. Note that all we need is the symmetric functions y, y. More generally we will need the symmetric functions of y, y,..., y k. Note 3. Similar to the solution in Section 3., we can do all of this mod p and hence space lg(n) + O(). 3.3 Solution that uses Symmetric Functions Throughout As Bob hears the numbers he maintains the SUM and the SUM OF PRODUCTS OF PAIRS. After hearing the first L numbers he has in his head i L x i AND L i<j L x ix j. We need to show that he can actually do this. Let s L 0 (x,..., x L ) s L (x,..., x L ) s L (x,..., x L ) = ( We don t really need s 0 but it will make the notation nice.) = i L x i = i<j L x ix j For notational convenience we use s L i to mean s L i (x,..., x L ) 4

Assume Bob has s L 0, s L and s L. And then Bob sees x L. Bob wants s L 0, s L, s L. We explain all of this by expanding everything out s L 0 =. Thats easy. We rewrite this as s L = (x + + x L ) + x L = s L + x L. s L = (x + + x L ) + x L = s L + x L s L 0 since this way it will give all of he equations (and more so for the k = 3 and k 4 cases) look the same. s L = x i x j i<j L We break this sum up into two parts- those parts that use x L and those parts that do not. If a product of two x i terms uses x L then it is of the form x i x L. hence s L = i<j L x i x j + x L (x + + x L ). AH- note that i<j L x ix j + x L (x + + x L ) is s L and x + + x L is s L. So we write this as We now write all of the equations together: s L = S L + x L s L. 5

s L 0 = s L s L = s L + x L s L 0 = s L + x L s L Hence we can keep the counters s and s and update them easily, using only O(log n) space. At the end we have s n and s n. KEY: Bob can compute, before seeing any of the data: s n s n = i n x i = i n i = i<j n x ix j = i<j n ij We want to derive s (y, y ) = y + y and s (y, y ) = y y and then finish up the proof as we did in Section 3.. For notational convenience we denote s (y, y ) by s and s (y, y ) by s. Note that s n s n = s n + s = s n + s n s + s Note that s n, s n, s n, s n are known. Hence s = (y +y ) and s = y y can be determined. Now do the poly-root trick. 4 Find the Missing Three Numbers Alice is going to say all but three of the numbers in the set {,,..., n} in some order. Bob will listen and try to discern which three numbers are missing. We denote the missing numbers y, y, y 3. We give two solutions. 6

4. Solution Using Sums of Powers Bob keeps track of the sum of terms, squares of terms, and cubes of terms. By subtracting them from the known quantities n i= i, and n i= i, and n i= i3 Bob obtains: y + y + y 3 y + y + y 3 y 3 + y 3 + y 3 3 Can he use these to determine y, y, y 3? We WANT to obtain: y + y + y 3 (thats easy!) y y + y y 3 + y y 3 y y y 3 ONCE we have them we form the polynomial X 3 (y + y + y 3 )X + (y y + y y 3 + y y 3 )X y y y 3 = (X y )(X y )(X y 3 ). Find its roots. Then you have y, y.y 3. OKAY, now to find those functions of y, y, y 3. We first try an intuitive thing: (y + y + y 3 ) (y + y + y 3) This is intuitive to try since we can already see that all of the square terms will drop out and might leave us with something simple. 7

(y + y + y 3 ) (y + y + y 3) = y y + y y 3 + y y 3 = (y y + y y 3 + y y 3 ). GREAT!- that last term is twice what we want. In other words: NOW we want y y y 3. y y + y y 3 + y y 3 = (y + y + y 3 ) (y + y + y 3) The next equation is harder to motivate so I won t even try (though its a special case of Newton s identity which I will discuss when doing the general k case): y y y 3 = (y y + y y 3 + y y 3 )(y + y + y 3 ) (y + y + y 3 )(y + y + y3) + (y 3 + y 3 + y3) 3. 3 Great! Now that we have y + y + y 3, y y + y y 3 + y y 3, y y y 3 4. Solution that uses Symmetric Functions Throughout This solution is due to Y. Minsky, A. Trachtenberg, and R. Zippel []. The KEY to the solution in the last section was that we used the sums-of-powers to obtain the symmetric functions y + y + y 3, y y + y y 3 + y y 3, y y y 3. In this solution we get the symmetric functions more directly. As Bob hears the first L numbers x,..., x L he maintains: i L x i. L i<j L x ix j. L i<j<k L x ix j x k. 8

We need to show that he can actually do this. Let s L 0 (x,..., x L ) s L (x,..., x L ) s L (x,..., x L ) s L 3 (x,..., x L ) = (We have this just so the equations look nice.) = i L x i = i<j L x ix j = i<j<k L x ix j x k For notational convenience we use s L i to mean s L i (x,..., x L ) We need to show that Bob can easily get s L 0, s L, s L, s L 3 from s L 0, s L, s L, s L 3 and x L. s L 0 = so thats easy. s L = (x + + x L ) + x L = s L + x L = s L + x L s L 0. SO we now have s L in terms of stuff Bob knows. Consider s L = i<j L x ix j We separate out the pairs the involve x L from the ones that don t. The ones that don t involve x L are just s L = i<j L x ix j. The ones that DO involve x L involve just x L and some x i with i < L. Thats just Hence x x L + x x L + + x L x L = x L (x + + x L ) = x L s L SO we now have s L in terms of stuff Bob knows. s L = s L + x L s L s L 3 is similar and we leave it to the reader. To summarize we have: 9

s L 0 = s L s L s L 3 = s L + x L s L 0 = s L + x L s L = s L 3 + x L s L Hence we can keep the counters s, s, s 3 and update them easily, using only O(log n) space. At the end we have s n 3 and s n 3 and s n 3 3. KEY: Bob can compute, before seeing any of the data: s n 0 = s n s n s n 3 = i n x i = i n i = i<j n x ix j = i<j n ij = i<j<k n x ix j x k = i<j<k n ijk We want to derive s 3 (y, y, y 3 ) = y + y + y 3 and s 3 (y, y, y 3 ) = y y + y y 3 + y y 3 and s 3 3(y, y, y 3 ) = y y y 3. We can then finish up the proof using the poly-roots trick. For notational convenience we denote s 3 i (y, y ) by s 3 i. Note that For s it is easy to relate s n, s n 3, and s 3 since s n (x..., x n ) = x + + x n = (x + + x n 3 ) + (y + y + y 3 ) = s n 3 + s 3. Hence we can derive s 3 from s n and s n 3, both of which we know. Consider s n = i<j n x ix j. We break this into pieces. Some pairs use NO elements of y, y, y 3. That would be s n 3 = i<j n 3 x ix j. Some pairs use y and some element of {x,..., x n 3 }. That would be 0

y (x + + x n 3 ) = y s n 3 Some pairs use y and some element of {x,..., x n 3 }. That would be y (x + + x n 3 ) = y s n 3 Some pairs use y 3 and some element of {x,..., x n 3 }. That would be The sum of the last three cases is y 3 (x + + x n 3 ) = y 3 s n 3 (y + y + y 3 )(x + + x n 3 ) = (x + + x n 3 )(y + y + y 3 ) = s n 3 s 3 Some pairs use two elements from {y, y, y 3 }. That would be y y + y y 3 + y y 3 = s 3. If you put this all together you get: s n = s n 3 s 3 0 + s n 3 s 3 + s n 3 0 s 3. A similar equation for s n 3 can also be derived; however, we leave that for the reader. To summarize we have in total: s n s n s n 3 = s n 3 s 3 0 + s n 3 0 s 3 = s n 3 s 3 0 + s n 3 s 3 + s n 3 0 s 3 = s n 3 3 s 3 0 + s n 3 s 3 + s n 3 s 3 + s n 3 3 s 3 0

Note that s n, s n, s n 3, s n 3, s n 3, s n 3, s n 3 0, s 3 0 are known. Hence s 3, s 3, s 3 3 can all be derived. Now we can do the poly-roots trick. 5 General k We ll need the symmetric functions for both solutions. Notation 5. s L 0 (x,..., x L ) = s L (x,..., x L ) s L (x,..., x L ) s L 3 (x,..., x L ) = i L x i = i <i L x i x i = i <i <i 3 L x i x i x i3. =. s L k (x,..., x L ) = i < <i k L x i x ik We often leave out the arguments for notational convenience. We give two solutions. The arguments used here are similar to the ones used in the k = 3 case so we omit them. 5. Solution Using Sums of Powers Bob keeps track of the sums of powers, up to the kth power. At the end he has n k i= x i, n k i= x i,. n k i= xk i. From these Bob can easily derive

k i= y i, k i= y i,. k i= yk i. We find the symmetric functions FROM these. How? By using Newton s identities. We abbreviate s k m(y,..., y k ) by s k m. We abbreviate k i= yp i by p k. ms k m(y,..., y k ) = m ( ) i s n m i(y,..., y k ) k i= j= y i j ms k m = m ( ) i s n m i(y,..., y k ) k i= j= y i j 5. Solution Using Symm Functions Throughout This algorithm is essentially from a paper by Yaron Minksy, Ari Trachtenberg, Richard Zippel []. Using an argument similar to the one in Section 3.3 we can obtain: Lemma 5. s L 0 = s L s L s L 3 = s L + x L s L 0 = s L + x L s L = s L 3 + x L s L. =. s L k = s L k + x L s L k Hence if Bob has s L,..., s L, and then sees x L, you will be able to calculate s L,..., s L k. If all k of the x i are in {,..., n} then you can do all of this in space O(k log n). Therefore Bob can find s n k,..., s n k k. 3

s n k KEY: Bob can compute the following independent of the data. He will do this after he knows,..., s n k, and use them one-at-a-time below to save space. k s n s n s n 3 = i n i = i <i n ij = i <i <i 3 n x ix j x k = i <i <i 3 n ijk. =. s n k = i < <i k x i x ik = i < <i k n i i i k We seek, for all i k, s k i (y,..., y k ). We denote these s k i for notational convenience. The following are easily seen to be true: s n s n s n 3 s n i = s n k s k 0 + s k s0 n k = s n k s k 0 + s n k s k + s n k 0 s k = s n k 3 s k 0 + s n k s k + s n k s k + s n k 0 s k 3. =. = s n k i s k 0 + s n k i sk + s i n k s k + s0 n k s k i. =. s n k = s n k k s k 0 + s n k k sk + s k n k s k + s n k 0 s k k Note that Bob knows s n i, s n k i, s k 0, s n k 0. Hence Bob can use these equations to, one at a time (to save space) find s k,s k,..., s k k. Bob forms the polynomial X k s k X k + s k X k + + ( ) k s k k = (X y )(X y ) (X y k ). Find the roots. 4

References [] Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Transactions on Information Theory, 49(9):3 8, 003. 5