The space complexity of approximating the frequency moments

The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1

Overview Introduction Approximations of frequency moments lower bounds 2

Frequency moments Problem Estimate F k = n mi k i=1 for k N in sublinear space m i summary of a data set m i ˆ= # occurrence of item i F 0 ˆ= # distinct values F 1 ˆ= length of stream F 2 ˆ= repeat rate 1 2...... n item i 3

Case k 0 Theorem There exists a randomized algorithm that computes, given a sequence A = (a 1,..., a m ) of members of N = {1,..., n} in one pass and using O ( n(log n + log m) ) memory bits, a number Y such that Pr [ Y F k λf k ] 1 ε 4

Basic idea median of means define random variables Y i such that expected value is F k variance is relatively small apply Chebyshev and Chernoff Algorithm Estimate F k 1: for all i do 2: for all j do 3: compute X ij 4: Y i average of all X ij 5: output median of ally i 5

Proof - Preconditions Given a sequence A = (a 1,..., a m ), a i [n] random variables Y 1,..., Y s2 random variables X 1,..., X s1, i.i.d. compute X ij in O(log n + log m) space Algorithm Estimate F k for all i do for all j do compute X ij Y i average of all X ij output median of ally i 6

Proof - Computation of X choose an index p {1,..., m} uniformly at random track a p in A subsequently set r ˆ= # occurrence of a p for q p define X = m ( r k (r 1) k) Sequence A : a 1 a 2... a p... a m 1 a m 7

Proof - Expectation of X E[X] = n m i i=1 j=1 ( 1 ( ) ) m(j k (j 1) k ) m 8

Proof - Expectation of X E[X] = n m i i=1 j=1 ( 1 ( ) ) m(j k (j 1) k ) m Consider a fixed item i 1 k + (2 k 1 k ) + + ((m 1 1) k (m 1 2) k ) + (m k 1 (m 1 1) k ) 8

Proof - Expectation of X E[X] = n m i i=1 j=1 ( 1 ( ) ) m(j k (j 1) k ) m Consider a fixed item i 1 k + (2 k 1 k ) + + ((m 1 1) k (m 1 2) k ) + (m k 1 (m 1 1) k ) =m k i 8

Proof - Expectation of X E[X] = n m i i=1 j=1 ( 1 ( ) ) m(j k (j 1) k ) m Therefore E[X] = m k 1 + + mk n = n i=1 m k i = F k 8

Proof - Variance of X Consider the definition Var[X] = E[X 2 ] E[X] 2 Similar to last slide E[X 2 ] kf 1 F 2k 1 9

Proof - Variance of Y Observation [ ] 1 s 1 E[Y ] = E X i = 1 s 1 E [X i ] = F k = E[X] s 1 s 1 i=1 i=1 10

Proof - Variance of Y Observation [ ] 1 s 1 E[Y ] = E X i = 1 s 1 E [X i ] = F k = E[X] s 1 s 1 i=1 i=1 Therefore Var[Y ] kn1 1/k F 2 k s 1 10

Proof - Probability for a single Y Keep in mind E[Y ] = E[X] = F k Apply Chebyshev s Inequality Pr[ Y F k > λf k ] Var[Y ] (λf k ) 2 1 8 11

Proof - Probability for the median of all Y i Define a bad event Therefore Z i = 1 Y i F k > λf k s 2 Z = i=1 Z i s 2 E[Z ] = E[Z i ] s 2 8 i=1 12

Proof - Probability for the median of all Y i Define a bad event Therefore Z i = 1 Y i F k > λf k s 2 Z = i=1 Z i s 2 E[Z ] = E[Z i ] s 2 8 i=1 By choosing δ = 3 and µ = s 2 /8 Chernoff Bound supplies [ Pr Z s 2 2 ] ε 27 32 ln(2) ε, 0 < ε < 1 12

Case k = 2 Theorem There exists a randomized algorithm that computes, given a sequence A = (a 1,..., a m ) of members of N = {1,..., n} in one pass and using O (log n + log m) memory bits, a number Y such that Pr [ Y F 2 λf 2 ] 1 ε 13

Basic idea similar structure to the proof before linear sketch use four-wise independent random variables ε i space complexity before: O ( n(log n + log m) ) now: O (log n + log m) ( n X = i=1 ε i m i ) 2, ε i { 1, 1} 14

Necessity of randomization Proposition For any nonnegative integer k 1, any deterministic algorithm that outputs, given a sequence A of n/2 elements of N = {1,..., n}, a number Y such that must use Ω(n) memory bits. Y F k 0.1F k 15

Basic idea specific family of subsets of N two different input sequences compare memory configurations apply pigeon-hole principle memory A(G 2, G 1 ) A(G 1, G 1 ) 16

F Definition F = max 1 i n m i Theorem Any randomized algorithm that outputs, given a sequence A of at most 2n elements of N = {1,..., n} a number Y such that Pr[ Y F F /3] 1 ε for some fixed ε < 1/2 must use Ω(n) memory bits. 17

Basic idea Disjointness problem DIS n (x, y) boolean function given set N = {1,..., n} two players with input x resp. y x, y {0, 1} n characterize subsets N x, N y of N output 1 iff N x N y Reduce DIS n to F = max 1 i n m i lower bound for DIS n is known define communication protocol to compute DIS n 18

Case k > 5 Theorem For any fixed k > 5 and δ < 1/2, any randomized algorithm that outputs, given an input sequence A of at most n elements of N = {1,..., n} a number Z k such that Pr[ Z k F k 0.1F k ] 1 δ uses at least Ω(n 1 5/k ) memory bits. 19

Yao s Minimax Principle expected cost of randomized algorithm (worst-case) cost of best deterministic algorithm against a certain distribution of inputs Here: lower bound for randomized algorithm show that no deterministic algorithm performs well against inputs under a certain distribution 20