PROBABILITY AND STATISTICS IN COMPUTING. III. Discrete Random Variables Expectation and Deviations From: [5][7][6] German Hernandez

Conditional PROBABILITY AND STATISTICS IN COMPUTING III. Discrete Random Variables and Deviations From: [5][7][6] Page of 46 German Hernandez

Conditional. Random variables.. Measurable function Let (Ω, A) and (Θ, G) be measurable spaces. Υ : Ω Θ is called a measurable function if A function Υ (G) A, for all G G, Page 2 of 46 i.e., the pre-image of a measurable set is measurable. In order to define real random variables we use a special σ- algebra on the real set, called the Borel σ-algebra and noted B(R) or simply B. This is the minimal σ-algebra on R that contains the sets of the form (, x] for all x R.

Conditional Page 3 of 46.2. Random variable A random variable (r.v.) X is a Borel measurable function defined on a probability space. Here a r.v. will be consider only real random variables, i.e., here a r.v. is of the form: X : (Ω, A, P ) (R, B) A r.v. can be seen as a translation or encoding of the outcomes of a random experiment into the real set. The condition of Borel measurability allows us to translate or encode the probabilistic structure of the experiment in a consistent manner from the events of the random experiment to Borel sets.

Conditional Random Variable X : Ω R ω ω 2 ω3 : : ω k Page 4 of 46 (Ω,A,P) Probability Space R

Conditional Page 5 of 46.3. Induced probability by a r.v. A random variable X : Ω R from a probability space (Ω, A, P ) defines an induced probability measure P X on (R, B) defined as P X : R [0, ] B P (X (B)) Usually, it is difficult to work directly with the induced probability. The alternative is to work with a function on R, called a distribution function, that encodes all the relevant information of the induced probability.

Conditional.4. Distribution function of a r.v. The distribution function or cumulative distribution function of a r.v. X, noted F X, is defined as F X : R R x P ( X x ) = P ( X ((, x]) ) Distribution Function of X F (x) =P (X ((-,x ]) ) Page 6 of 46 - X - Borel set (-,x ] - X ((-,x ]) P x X 0 Random Variable (Ω,Α,P) Probability Space

Conditional Page 7 of 46 The distribution function has the following properties: (i) lim x F X (x) = 0 and lim x F X (x) =, (ii) F X is a nondecreasing function, and, (iii) right continuous.

Conditional Page 8 of 46.5. Types of a random variables. A random variable is called discrete if its range is countable. In this case we can define the probability mass function f X (x) = P ( X = x ). A random variable is called continuous if its distribution function F X is continuous and is called absolutely continuous if there exists a non-negative Riemann integrable function, called the probability density of the r.v., f X (x) : B B such that F X (x) = x f X (τ)dτ.

Conditional Page 9 of 46 2. Let X be a discrete random variable taking on the values x, x 2,..., x k,.... The mean value or expectation of X, denoted by E[X] is defined by E[X] = x k P (X = x k ). k= Let X be a continuous random variable taking values on R. Then E[X] is defined by E[X] = xf X (τ)dτ.

Conditional Page 0 of 46 Example Indicator random variable Given an event A the indicator random variable of the event, denoted I A or sometimes I{A}, is equal to if A occurs, or 0 if A does not occur. F {, if A occurs I A = 0, if A c occurs Find the its expectation? E[I A ] = P (A) + 0 P (A c ) = P (A) If random variable X only takes on the value 0 or is called Bernoulli random variable and its expected value E[X] = P (). Example 2 Geometric random variable Consider a sequence of independent trials, each of which is a success with probability p (0, ), and a failure with probability p. If X represents the trial number of the first success, then X is said to be a geometric random variable with parameter p. Compute its expected value.

Conditional We have that P {X = n} = p( p) n, n > Hence, E[X] = np( p) n = p n( p) n = p n= n= Page of 46 Using the identity na n = n= n=0 d a n da = d n=0 an da = d ( a) da = ( a) 2

Conditional Example 3 Unbounded expectation Page 2 of 46 Then P (X = 2 i ) = 2i, i =, 2,... E[X] = 2 i 2 = = i

Conditional Theorem Let X be a discrete variable that takes only nonnegative integer values. Then E[X] = P (X i). Page 3 of 46 Proof. P (X i) = j=i P (X = j) = j j= P (X = j) = j= jp (X = j) = E[X]

Conditional Page 4 of 46 Example 4 Geometric random variable Hence P (X i) = p( p) n = ( p) i. n=i E[X] = = = ( p) = p P (X j) ( p)i

2.. Linearity Random variables Conditional Page 5 of 46 Given a family of r.v s {X},..n [ n ] n E X i = E [X i ] Proof. E [X + Y] = x y (x + y)p ((X = x) (Y = y)) = x y xp ((X = x) (Y = y)) + x y yp ((X = x) (Y = y)) = x x y P ((X = x) (Y = y)) + y y x P ((X = x) (Y = y)) = xp ((X = x)) + i y yp ((Y = y)) = E [X] + E [Y] Theorem 2 E [cx i ] = ce [X i ]

Conditional Page 6 of 46 Example 5 Bubble sort Let X denote the number of comparison in needed by Bubble sort. Obtain upper ad lower bounds for E(X). Bubble-Sort(A[ n]) for i = to n 2 do for j = to n i 3 do if A[j] > A[j + ] 4 then A[j] A[j + ] 5 If no swap occurs exit

n=5 5 3 8 7 0 Random variables Conditional j= i=2 3 5 8 7 0 j= 3 5 7 0 8 j=2 3 5 8 7 0 j=2 3 5 7 0 8 j=3 3 5 7 8 0 j=3 3 5 0 7 8 j=4 3 5 7 0 8 Page 7 of 46 j= j=2 i=3 i=4 3 5 0 7 8 3 0 5 7 8 j= 0 3 5 7 8 0 3 5 7 8

In the worst case bubble sort requires Random variables Conditional Page 8 of 46 (n ) + (n 2) + + 2 + = n(n ) 2 then n(n ) E(X) 2 In order to obtain a lower bound we need the concept of number of inversion in a permutation. For a permutation i, i 2,, i n of, 2,, n we say that a ordered pair (i, j) is an inversion of the permutation if i < j and j precedes i in the permutation. For instance the permutation 2, 4,, 5, 6, 3 has five inversions: namely, (, 2), (, 4)(3, 4), (3, 5), (3, 6). Because the values of every inversion pair will eventually have to be interchanged (and thus compared) it follows that the number of comparisons made by bubble sort is at least as large as the number of inversions of the initial ordering.

Conditional Then if I denotes the number of inversions I X which implies that E[I] E[X] Let I(i, j) for i < j be {, if (i, j) is an inversion of the initial ordering I(i, j) = 0, otherwise, Page 9 of 46 then it follows that I = j I(i, j) i<j

Conditional Hence, using linearity of the expectation E[I] = E[I(i, j)] j i<j Now, for i < j Page 20 of 46 E[I(i, j)] = P {j precedes i in the initial ordering} = 2 ( ) n There are pairs i, j for which i < j, then it follows 2 that E[I] = j E[I(i, j)] = i<j ( n 2 2 ) = n(n ) 4 Thus n(n ) n(n ) E[X] 4 2 E[X] = Θ(n 2 )

Conditional Page 2 of 46 3. The Quicksort and Find Algorithms Suppose that we want to sort a given set of n distinc values, x, x 2,..., x n. A more efficient algorithm than bubble sort for doing so is the quicksort algorithm, which is recursively defined as follows. When n = 2, the algorithm compares the two values an puts them in the appropriate order. Whem n > 2, one of the values is chosen, say it is x i, and then all of the other values are compared with x i. Those smaller than x i, are put in bracket to the left of x i, and those larger than x i,are put in a bracket to the right of x i. The algorithm the repeats itself in these brackets, continuing until all values have been sorted. For instance, suppose that we desire to sort the following 0 distinct values:

Conditional Page 22 of 46 5, 9, 3, 0,, 4, 8, 4, 7, 6 {5, 9, 3, 8, 4, 6,} 0, {, 4, 7,} {5, 3, 4}, 6, {9, 8,} 0, {, 4, 7,} {3}, 4, {5}, 6, {9, 8,} 0, {, 4, 7,} It is intuitively that the worst case occurs when every comparison value chosen is an extreme value. In this worst scenario, the number of comparisons need is (n ) + (n 2) +... + = n(n )/2. A better indication by determinig the average number of comparisons needed when the compararison value are randomly chosen.

Conditional Page 23 of 46 Let X denote the number of comparisons needed. Let denote the smallest value, let 2 denote the second value smallest, and so on. Then, for i < j n, let I(i, j) equal if i and j are ever directly compared, and let it equal 0 otherwise. which implies that n E[X] = E[ = = n X = n j=i+ n j=i+ n n j=i+ n n j=i+ I(i, j)] E[I(i, j)] I(i, j) P {i and j are ever compared}

Conditional Page 24 of 46 To determine the probability that i and j are ever compared, note that the values i, i +,..., j, j will initially be in the same bracket and will remain in the same bracket if the number chosen for the first comparison is not between i and j. Thus, the probability that it is i or j is 2/(j i + ). Therefore, P {i and j are ever compared} = 2 j i +

Consequently, we see that Random variables Conditional Page 25 of 46 For large n E[X] = n n j=i+ n = 2 = 2 = 2 n k=2 n i+ k=2 n+ k n 2 j i + = 2 ( 2 + 3... + n i + ) k k n (n + k) k k=2 = 2(n + ) = (2n + 2) n k=2 n k= 2(n ) k k 4n n k= k ln(n) Thus, the quicsort algorithm requieres, on average, approximately 2n log(n) comparison to sort n values.

Conditional Page 26 of 46 3.. The Find Algorithm Suppose that we want to find the kth-smallest of a list. The find algorithm is quite similar to quicksort; Suppose that r items are put in the bracket to left. There are now three possibilities: {2, 5, 4, 3}, 6, { 0, 2, 6, 8}. r = k then, the algorithm ends. 2. r < k then, kth smallest value is the (k r)th smallest of the n r values in the right bracket 3. r > k then, search for the kth smallest of the r values in the left bracket.

We have, as in the quicksort analysis, Random variables Conditional Page 27 of 46 and E[I(i, j)] = X = n n n j=i+ I(i, j) P {i and j are ever compared} To determine the probability that i and j are ever compared, we consider cases: Case i < j k In this case i, j, k will remain together until one of the values i, i +,..., k is chosen as the comparison value. P {i and j are ever compared} = 2 k i +

Conditional Case 2: i k < j Page 28 of 46 Case 3: P {i and j are ever compared} = k < i < j P {i and j are ever compared} = 2 j i + 2 j k +

It follow from the preceding that Random variables Conditional Page 29 of 46 n 2 E[X] = j j=2 k i + + n k j=k+ j i + + n j j=k+2 i=k+ j i + To the approximate the preceding when n and k are large, let k = αn for 0 < α <. Now, n j=2 j k k i + = = = k j=i+ k k j=2 k i k i + j j k log(k) k = αn k i +

Conditional n k j=k+ n j i + = j=k+ n j=k+ n k ( j k + +..., ) j (log(j) log(j k)) log(x)dx n k log(x)dx n log(n) n (αn log(αn) αn) (n αn) log(n αn) + (n αn) n[ α log(α) ( α) log( α)] As it similarly follows that Page 30 of 46 we see that n j j=k+2 i=k+ j k + = n k = n( α) E[X] 2n[ α log(α) ( α) log( α)] Thus, the mean number of comparison needed by the find algorithm is a linear function of ghe number of values.

Conditional Page 3 of 46 4. Markovs Inequality Let X be a random variable that assumes only non negative values. Then for all a > 0, P {X a} E[X] a

Conditional Page 32 of 46 Proof. For a > 0, let I be the indicator variable of {X a}. { if X a I(x) = 0 otherwise then ai X E[aI] E[X] E[I] E[X] a P {X a} = E[I] E[X] a Is interesting for a > E[X] and in particular for a = ne[x]. This inequality can be applied when too little is known about a distribution. In general is too weak to yield useful bounds but it is fundamental to develop other useful bounds.

Conditional 5. Variance Let X be a random variable. The variance V ar[x] of X is defined by V ar[x] = E [ (X E[X]) 2] = E[X 2 ] (E[X]) 2. Page 33 of 46 Let X, Y be a random variables. The covariance Cov[X] of X is defined by Cov[X, Y] = E [(X E[X])(Y E[Y])]

Conditional 5.. Properties. Cov[X, Y] = E [XY] E[X]E[Y] 2. Cov[X, Y] = Cov[Y, X] 3. Cov[X, X] = V ar [X] 4. Cov[cX, Y] = ccov[x, Y] 5. Cov[X, Y + Z] = Cov[X, Y] + Cov[Y, Z] ( n ) m n m 8. Cov X i, Y j = Cov(X i, Y j ) i=0 j=0 i=0 j=0 7. If X and Yare independent then Cov(X, Y) = 0 Page 34 of 46 8. V ar ( n X i=0 i) = Cov ( n X i=0 i, n X ) j=0 j = n n Cov(X i=0 j=0 i, X j ) = n Cov(X i=0 i, X i ) + n Cov(X i=0 j i i, X j ) = n V ar(x i=0 i) +2 n Cov(X i=0 j <i i, X j )

Conditional Page 35 of 46 Example 6 [7] A coupon collecting problem Suppose that there are m different coupons, and each time one obtains it is equally likely to any of these types. If X is denotes the number of coupons one need to collect to have at least one of each type find the expected value of X. We have m coupons, let us call X the number of picks that we require to get the first type of coupon, X 2 the number or picks required to get two types of coupons, X 3 the number picks that we require get three types of coupons different cards, and so on. In the first step we get one type of coupon, then X =, form that on the number of picks that we require in order get new type of coupon is geometrically distributed with success probability p 2 = m, i.e., m P (X 2 = k) = ( p 2 ) k p 2, for k =, 2, and the expected time to get the second type of coupon is E[X 2 ] = p 2 = m m.

Conditional Inductively, once we have gotten i different types of coupons, the time to get the i-th type of coupon X i is geometrically distributed with success probability, i.e., p i = m (i ) m P (X i = k) = ( p i ) k p i, for k =, 2, Page 36 of 46 and the expected time to get the ith type of coupon is E[X i ] = p i = m m i +.

Conditional Page 37 of 46 We have that the time to get all the coupons is X = X + X 2 + X m thus, the expected time to o get all the coupons is E[X] = E[X ] + E[X 2 ] + E[X m ] = m + m m { + + m m } = m + + + m m = m m m log m i Is easy to see that the geometric random variables X i are independent and V ar(x i ) = p i p 2 i.

Conditional Page 38 of 46 then V ar(x) = and m p 2 i Then m V ar(x i ) = = m 2 m m p i p 2 i (m i + ) 2 = m2 V ar(x) = m 2 m m i m 2 = m p 2 i m i2, and i < m2π2 6 m p i p i = E(X)

Conditional Page 39 of 46 Example 7 [7] Another coupon collecting problem Find the expected value and variance of the number of types of coupons after collecting n coupons? Let X be the number of types of coupons that you have in the collection at some point where X i = X = m i=0 X i {, if a type i is in the collection 0, otherwise As X i is a Bernoulli variable we have and E[X] = ( m E[X i ] = m m E[X i ] = m i=0 ( ) n ( ) n ) m m

Conditional Page 40 of 46 V ar(x i ) = ( m m ) n ( Also for i j, X i X j si also Bernoulli ( ) n ) m m E[X i X j ] = P (X i X j = ) = P (A i A j ) where A k is the event that contains at lest one coupon of type k. P (A i A j ) = P ((A i A j ) c ) = P (A c i A c j) = P (A c i) P (A c j) + P (A c ia c j) = 2 ( ) m n + ( ) m 2 n m m Therefore, for i j, Cov[X i, X j ] = 2 ( ) m n + ( ) m 2 n ( ( ) m n ) 2 m m m = ( ) m 2 n ( ) m 2n m m Hence V ar(x) = m [( ) m n ( ( m m ) n )] [ m (m 2 ) n +m(m ) ( ) ] m 2n m m

Conditional Page 4 of 46 Example 8 [7] Runs Let X, X 2,, X n be a sequence of independent binary random variables with P (X = ) = p. A maximal consecutive subsequence of s is called a run. For instance the sequence, 0,,,, 0, 0,,,, 0 has 3 runs. Let R be number of runs, find E[X], and V ar(x). Let I i = for i =, 2,, n then {, if a run begins at i 0, otherwise R = n I i

Conditional Because E[I ] = P (X = ) = p E[I i ] = P (X i = 0, X i = ) = ( p)p, for i > it follows that E[R] = n E[I i ] = p + (n )p( p) Page 42 of 46 To compute the variance n V ar(r) = V ar(i i ) + 2 n Cov(I i, I j ) i<j

Conditional Page 43 of 46 Because I i is a Bernoulli random variable we have that V ar(i i ) = E[I i ]( E[I i ]) and also that if i < j I i then I j are independent implying that Cov(I i, I j ) = 0. Moreover I i I i = 0, then Hence, Cov(I i, I j ) = E[I i ]E[I j ] V ar(r) = V ar(i ) + n V ar(i i=2 i) + 2Cov(I, I 2 ) +2 n Cov(I i=3 i, I i ) = p( p) + (n ) [p( p)[ p( p)]] 2p 2 ( p) 2(n 2) [p 2 ( p) 2 ] Application: Number of ranges that of a word in a book index.

Conditional 6. Chebyshevs Inequality For any a > 0, P { X E[X] a} V ar[x] a 2 Page 44 of 46

Conditional Proof. P ( X E[X] a) = P ([X E[X]] 2 a 2 ) Since [X E[X]] 2 is a nonbegative r.v. then appyting Markov s inequality Page 45 of 46 P {[X E[X]] 2 a 2 } E[X E[X]]2 a 2 = V ar[x] a 2 Is interesting for a = nv ar[x].

References Random variables Conditional Page 46 of 46 [] G.R. Grimmett and D.R. Stirzaker. Probability and Random Process. Oxford Science Publications, 998. [2] C.M. Grinstead and J.L. Snell. Introduction to Probability. American Mathematical Society, 997. [3] H.P. Hsu. Probability, Random Variables, and Random Processes. McGraw Hill- Schaum, 996. [4] F.P. Kelly. Probability. Notes form The Archimedeans http://www.cam.ac.uk/societies/archim/notes.html, 996. [5] M. Mitzenmaker and E. Upfal. Probability and Computing: Randomized Algoirthms and probabilitsic Analysis. Cambridge University Press, 2005. [6] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 995. [7] S.M. Ross. Probability Models for Computer Science. Academic Press, 2002.