ECEN 689 Special Topics in Data Science for Communications Networks

Size: px

Start display at page:

Download "ECEN 689 Special Topics in Data Science for Communications Networks"

Ariel Chambers
5 years ago
Views:

1 ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 3 Estimation and Bounds

2 Estimation from Packet Sampled Flow Records!me Sampling Flow records flow 1 flow 2 flow 3 Original flows (key, #packets) = (k,p) Generic subset sum query True value X(K) = #packets in flows with key k K How to estimate from packet sampled flow records Intuition: sampling yields about 1/N th of the original packets Multiply number of sampled packets by N to estimate #original packets Key idea: can apply the idea to any subset of packets Estimate: X(K) by X (K) = N * #packets in sampled flows with key k K

3 Unbiased Estimation Consider a single item (e.g. packet) Equipped with a size x (e.g. #bytes) Sampled with probability q The Horvitz-Thompson Estimator of x is x' =! " # # # x q if##the##item##is##selected 0 otherwise Theorem: x is unbiased estimator of x: Expectation E[x ] = x E[x] = sum of estimate values weighted by their probabilities = Pr[ item sampled] * x/q + Pr[item not sampled] * 0 = q * x / q + (1-q) * 0 = x

4 Subset Sum Estimation with Horvitz-Thompson Set of items with weights {x 1, x 2,, x n } Individual sampling probabilities {q 1, q 2,, q n } Subset sum X(S) = Σ i S x i for any subset S {1,2,,n } Estimate subset sum using sum of constituent estimates X (S) = Σ i S x i where x i = x i / q i if i selected, 0 otherwise Theorem: X (S) is an unbiased estimator of X(S) Proof uses linearity of expectation E[X (S)] = E[Σ i S x i ] = Σ i S E[x i ] = Σ i S x i = X(S) Doesn t require independent sampling More on Horvitz-Thompson: Kolaczyk. Ch. 5.2 See also Duffield review article

5 Subset Sum Estimation for Packets (and Bytes) X(K) = #{ packets in samples flows with key k K} = Σ i S(K) x i, x i = 1; S(K) = packets with key k K For brevity will mostly write X for X(K) and S for S(K) Packets sampled with probability 1/N X = N* #packets in sampled flows with key k K = Σ i S(K) x i with x i = N if packet is sampled, 0 otherwise Recognize X as the HT estimator of X: E[X ] = X Can make similar construction for byte estimation

6 Estimation Accuracy Think of independent 1 in N sampling from X packets Number of samples has binomial B(X, 1/N) distribution Estimate X is distributed as N * B(X, 1/N) i.e. Example: X = 5,000, N =100, 1,000 trials of X! Pr[X'=xN] = # " X x $ & % N x (1 1/ N) x X

7 Accuracy and Bounds HT estimation: weights {x 1,, x n } & probabilities {q 1, q 2,, q n } Initially examine accuracy of subsets sums X = X(K) via variance: Var(X ) = E[ (X X) 2 ] = E[ (X ) 2 ] X 2 Assume independent sampling for the moment: x i are independent Var(X ) = Var(Σ i S x i ) = Σ i S Var( x i ) Compute variance for HT estimate of individuals x i : Exercise: Var( x i ) = x i 2 (1/p i 1) and so Var(X ) = Σ i S x i 2 (1/p i 1)

8 Confidence Intervals for HT estimates? Common to think of confidence intervals in terms of the normal distribution Because the width of the distribution is characterized by the variance But distribution of HT estimator not normal; so either 1. Find confidence intervals using variance without normal assumption 2. Exploit simple form of the distribution of the estimator

9 Towards Variance Based Bounds Theorem (Markov inequality) Let Y be non-neggtive random variable with E[Y] < Then Pr[ Y a ] E[Y] / a Proof: E[Y] = E[Y Y < a ] * Pr[ Y < a] + E[Y Y a] * Pr[ Y a ] 0 + a * Pr[ Y a ] Use: take a = n * E[Y]. Then Pr[ Y n * E[Y] ] 1/n Probability Y exceeds its expectation by factor n is less than 1/n Very general, but does not depend on (or exploit) variance More on bounds: Mitzenmacher & Upfal, Ch. 3

10 Variance Based Bounds Markov Inequality: Pr[ Y a ] E[Y] / a, for Y>0, E[Y} < Given r.v. Z, set Y = (Z E[Z]) 2 and a = b 2 Apply Markov inequality (assume E[(Z E[Z]) 2 = Var(Z) < ) Pr[ (Z E[Z]) 2 b 2 ] E[ (Z E[Z]) 2 ] / b 2 In other words (this is called Chebychev s Inequality) Pr[ Z E[Z] b ] Var[Z] / b 2 Interpretation: set b = n* Sqrt(Var[Z]) = n * StdDev(Z) Probability that Z further that n * StdDev away from mean is 1/n 2

11 Application to Packet Sampling Estimate packet counts (x i = 1) ; uniform sampling prob 1/N Chebychev Inequality for X = X (K) Var(X ) = Σ i S Var(x i ) = X *(N-1) Prob( X X b) X*(N-1) / b 2 Rewriting with b = n * X a large deviation (of X (K) proportional to its expectation X Prob( X - X n * X ) (N-1) / (n 2 * X) Behavior of bound Increasing in N: less certainty if fewer sampled packets Decreasing in X: more certainty if estimating a larger subset

12 Exponential Bounds Recall Markov Inequality: Pr[ Y a ] E[Y] / a for Y 0 with E[Y] < Obtained Chebychev using Y = (Z-E(Z)) 2. Try similar. For a r.v. Z, let Y = e tz for t 0, and set a = exp(tz 0 ) {Z z 0 } = {Y exp(tz 0 ) } since exp(tz 0 ) is increasing in z 0 when t 0 Apply Markov Inequality: Pr[Z z o ] exp(-tz 0 )E[e tz ] Which value of t to take? The value of t that minimizes the upper bound! Pr[Z z 0 ] min t 0 exp(-tz 0 ) E[e tz ] 12

13 Exponential Bounds and Packet Sampling Independent packet sampling with probability q = 1/N Exponential bound Pr[Z z 0 ] min t 0 exp(tz 0 ) E[e tz ] Take Z = X =X (K) and z =(1+ δ) X (large deviation) Product form: exp(t X ) = exp(tσ i S x i ) = Π i S exp(t x i ) x i are independent, so expectation preserves product E[exp(t X )] = Π i S E[exp(t x i )] = E[exp(t x 1 )] X Exponential moment for single x i E[exp(t x 1 )] = 1* Pr[x 1 =0] + e t/q * Pr[x 1 =1/q] = (1-q) + q*e t/q Combining: Pr[ X (1+δ) X ] f(q, δ) X with f(q, δ) = min t 0 e -t(1+δ) (1+q (e t/q -1)) 13

14 Minimization and Chernoff Bounds Common to use further bound 1+q (e t/q -1) Exp q(e t/q -1) More convenient final result, especially if non-equal probabilities q i f(q, δ) g(q, δ) = min t 0 e -t(1+δ) Exp q(e t/q -1) Minimized when t = q log(1+δ), yielding g(q, δ) = ( e δ / (1+δ) 1+δ ) q Combining: Pr[X (1+δ) X ] ( e δ / (1+δ) 1+δ ) qx Bound depends only on δ: the size of the large deviation qx = X/N: exponentially small in the average number of samples Simpler bound using e δ / (1+δ) 1+δ < Exp[-δ 2 / 3] for 0<δ<1 14

15 Illustration: 2 Bounds Exp[-δ 2 / 3] e δ / (1+δ) 1+δ δ 15

16 Illustration: Exponentially small in E[#samples]=n n=1 (e δ / (1+δ) 1+δ ) n n=10 n=100 δ 16

17 Corresponding Lower Bounds Pr[X (1-δ) X ] ( e -δ / (1-δ) 1-δ ) qx Simplified bound using e -δ / (1-δ) 1-δ < Exp[-δ 2 / 2] for 0<δ<1 Exercise: verify 17

18 Confidence Bounds for X So far: given X, we can compute bounds on distribution of X e.g. Pr[X (1+δ) X ] ε for some small ε = exp(-δ 2 X/(3N)) In practice, we are given estimate X, want to bound true X Idea for lower bound: Given some small target ε, and our estimate X Find largest possible value X min for X < X such that: Pr[estimate is X or larger] ε exp(-δ 2 X/(3N)) X=1 X=10 X=100 ε 18

19 Confidence Bounds for X Find largest possible value X min for X < X such that Pr[estimate is X or larger] ε i.e. Pr[X X =(1+δ) X ] ε (*) where X has same distribution as X but we now consider X fixed Use (any) upper bound e.g. Pr[X (1+δ)X] exp(-δ 2 X/(3N)) For (*) to hold it suffices that: exp(-δ 2 X/(3N)) ε with δ = X /X -1 i.e. - (X /X -1) 2 X/(3N) log(ε) (**) X min is largest such X < X for which (**) is true Exercise: X min = X + (β-(β(β+4x )) 1/2 )/2 where β = -3N log(ε) Similar upper bound X max for X, using Chernoff Lower bound 19

20 Summary of HT and Bounds HT tells us how to make unbiased estimates from samples Size of bytes in flows from packet sampled netflow How accurate is estimate Chebychev Inequality and Chernoff Bounds General results, but can be applied to HT estimates Confidence intervals for true quantities from estimates 20

Proving the central limit theorem

SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit