n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe step to the left. How far do I expect to have traveled from my startig poit after steps? Deotig a right-move by + ad a left-move by, we ca describe the probability space here as the set of all words of legth over the alphabet {±}, each havig equal probability 2. For istace, oe possible outcome is (+,+,,..., ). Let the r.v. X deote our positio (relative to our startig poit 0) after moves. Thus X = X + X 2 + +X, { + if ith toss is Heads; where X i = otherwise. Now obviously we have E[X] = 0. The easiest rigorous way to see this is to ote that E[X i ] = ( 2 )+( 2 ( )) = 0, so by liearity of expectatio E[X] = i= E[X i] = 0. Thus after steps, my expected positio is 0. But of course this is ot very iformative, ad is due to the fact that positive ad egative deviatios from 0 cacel out. What the above questio is really askig is: What is the expected value of X, our distace from 0? Ufortuately, computig the expected value of X turs out to be a little awkward, due to the absolute value operator. Therefore, rather tha cosider the r.v. X, we will istead look at the r.v. X 2. Notice that this also has the effect of makig all deviatios from 0 positive, so it should also give a good measure of the distace traveled. However, because it is the squared distace, we will eed to take a square root at the ed. Let s calculate E[X 2 ]: E[X 2 ] = E[(X + X 2 + +X ) 2 ] = E[ = X 2 i= i= i + X i X j ] E[Xi 2 ]+ E[X i X j ] I the last lie here, we used liearity of expectatio. To proceed, we eed to compute E[Xi 2] ad E[X ix j ] (for i j). Let s cosider first Xi 2. Sice X i ca take o oly values ±, clearly Xi 2 = always, so E[Xi 2] =. What about E[X i X j ]? Sice X i ad X j are idepedet, it is the case that E[X i X j ] = E[X i ]E[X j ] = 0. Pluggig these values ito the above equatio gives E[X 2 ] = ( )+0 =. For idepedet radom variables X,Y, we have E[XY] = E[X]E[Y]. You are strogly ecouraged to prove this as a exercise, alog similar lies to our proof of liearity of expectatio i a earlier lecture. Note that E[XY] = E[X]E[Y] is false for geeral r.v. s X,Y ; to see this just look at the preset example, where we have E[X 2 i ] E[X i]e[x i ]. CS 70, Sprig 2008, Note 9

So we see that our expected squared distace from 0 is. Oe iterpretatio of this is that we might expect to be a distace of about away from 0 after steps. However, we have to be careful here: we caot simply argue that E[ X ] = E[X 2 ] =. (Why ot?) We will see shortly (see Chebyshev s Iequality below) how to make precise deductios about X from kowledge of E[X 2 ]. For the momet, however, let s agree to view E[X 2 ] as a ituitive measure of spread of the r.v. X. I fact, for a more geeral r.v. with expectatio E[X] = µ, what we are really iterested i is E[(X µ) 2 ], the expected squared distace from the mea. I our radom walk example, we had µ = 0, so E[(X µ) 2 ] just reduces to E[X 2 ]. Defiitio 9. (variace): For a r.v. X with expectatio E[X] = µ, the variace of X is defied to be Var(X) = E[(X µ) 2 ]. The square root Var(X) is called the stadard deviatio of X. The poit of the stadard deviatio is merely to udo the squarig i the variace. Thus the stadard deviatio is o the same scale as the r.v. itself. Sice the variace ad stadard deviatio differ just by a square, it really does t matter which oe we choose to work with as we ca always compute oe from the other immediately. We shall usually use the variace. For the radom walk example above, we have that Var(X) =, ad the stadard deviatio of X is. The followig easy observatio gives us a slightly differet way to compute the variace that is simpler i may cases. Theorem 9.: For a r.v. X with expectatio E[X] = µ, we have Var(X) = E[X 2 ] µ 2. Proof: From the defiitio of variace, we have Var(X) = E[(X µ) 2 ] = E[X 2 2µX + µ 2 ] = E[X 2 ] 2µE[X]+ µ 2 = E[X 2 ] 2µ 2 + µ 2 = E[X 2 ] µ 2. I the third step here, we used liearity of expectatio. Let s see some examples of variace calculatios.. Fair die. Let X be the score o the roll of a sigle fair die. Recall from a earlier lecture that E[X] = 7 2. So we just eed to compute E[X 2 ], which is a routie calculcatio: E[X 2 ] = 6 ( 2 + 2 2 + 3 2 + 4 2 + 5 2 + 6 2) = 9 6. Thus from Theorem 5. Var(X) = E[X 2 ] (E[X]) 2 = 9 6 49 4 = 35 2. 2. Biased coi. Let X the the umber of Heads i tosses of a biased coi with Heads probability p (i.e., X has the biomial distributio with parameters, p). We already kow that E[X] = p. Let { if ith toss is Head, X i = 0 otherwise. CS 70, Sprig 2008, Note 9 2

We ca the write X = X + X 2 + +X, ad the E[X 2 ] = E[(X + X 2 + +X ) 2 ] = i= E[Xi 2 ]+ E[X i X j ] = ( p)+(( ) p 2 ) = 2 p 2 + p( p). I the third lie here, we have used the facts that E[X 2 i ] = p, ad that E[X ix j ] = E[X i ]E[X j ] = p 2 (sice X i,x j are idepedet). Note that there are ( ) pairs i, j with i j. Fially, we get that Var(X) = E[X 2 ] (E[X]) 2 = p( p). As a example, for a fair coi the expected umber of Heads i tosses is 2, ad the stadard deviatio is 2. Notice that i fact Var(X) = i Var(X i ), ad the same was true i the radom walk example. This is o coicidece, ad it depeds o the fact that the X i are mutually idepedet. (Exercise: Verify this.) So i the idepedet case we ca compute variaces of sums very easily. As we shall see i example 3, however, whe the r.v.s are ot mutually idepedet, the variace is ot liearly additive. 3. Number of fixed poits. Let X be the umber of fixed poits i a radom permutatio of items (i.e., the umber of studets i a class of size who receive their ow homework after shufflig). We saw i a earlier{ lecture that E[X] = (regardless of ). To compute E[X 2 ], write X = X + X 2 + +X, if i is a fixed poit; where X i = 0 otherwise. The as usual we have E[X 2 ] = i= E[Xi 2 ]+ E[X i X j ]. () Sice X i is a idicator r.v., we have that E[Xi 2] = Pr[X i = ] =. I this case, however, we have to be a bit more careful about E[X i X j ]: ote that we caot claim as before that this is equal to E[X i ]E[X j ], because X i ad X j are ot idepedet (do you see why ot?). But sice both X i ad X j are idicators, we ca compute E[X i X j ] directly as follows: E[X i X j ] = Pr[X i = X j = ] = Pr[both i ad j are fixed poits] = ( ). [Check that you uderstad the last step here.] Pluggig this ito equatio () we get E[X 2 ] = ( )+(( ) ( ) ) = + = 2. Thus Var(X) = E[X 2 ] (E[X]) 2 = 2 =. I other words, the variace ad the mea are both equal to. Like the mea, the variace is also idepedet of. Ituitively at least, this meas that it is ulikely that there will be more tha a small umber of fixed poits eve whe the umber of items,, is very large. Chebyshev s Iequality We have see that, ituitively, the variace (or, more correctly the stadard deviatio) is a measure of spread, or deviatio from the mea. Our ext goal is to make this ituitio quatitatively precise. What we ca show is the followig: CS 70, Sprig 2008, Note 9 3

Theorem 9.2: [Chebyshev s Iequality] For a radom variable X with expectatio E[X] = µ, ad for ay α > 0, Pr[ X µ α] Var(X) α 2. Before provig Chebyshev s iequality, let s pause to cosider what it says. It tells us that the probability of ay give deviatio, α, from the mea, either above it or below it (ote the absolute value sig), is at most Var(X). As expected, this deviatio probability will be small if the variace is small. A immediate corollary α 2 of Chebyshev s iequality is the followig: Corollary 9.3: For a radom variable X with expectatio E[X] = µ, ad stadard deviatio σ = Var(X), Pr[ X µ βσ] β 2. Proof: Plug α = β σ ito Chebyshev s iequality. So, for example, we see that the probability of deviatig from the mea by more tha (say) two stadard deviatios o either side is at most 4. I this sese, the stadard deviatio is a good workig defiitio of the width or spread of a distributio. We should ow go back ad prove Chebyshev s iequality. The proof will make use of the followig simpler boud, which applies oly to o-egative radom variables (i.e., r.v. s which take oly values 0). Theorem 9.4: [Markov s Iequality] For a o-egative radom variable X with expectatio E[X] = µ, ad ay α > 0, Pr[X α] E[X] α. Proof: From the defiitio of expectatio, we have E[X] = a Pr[X = a] a = a<α a Pr[X = a]+ a Pr[X = a] a Pr[X = a] α Pr[X = a] = α Pr[X α]. The crucial step here is the third lie, where we have used the fact that X takes o oly o-egative values ad cosequetly a 0 ad Pr[X = a] 0. (This step is ot valid otherwise, sice if X ca take o egative values, the we might have a Pr[X = a] < 0.) Now we ca prove Chebyshev s iequality quite easily. Proof of Theorem 5.2: Defie the r.v. Y = (X µ) 2. Note that E[Y] = E[(X µ) 2 ] = Var(X). Also, otice that the probability we are iterested i, Pr[ X µ α], is exactly the same as Pr[Y α 2 ]. (This is because the iequality X µ α is true if ad oly if (X µ) 2 α 2 is true, i.e., if ad oly if Y α 2 is true.) Moreover, Y is obviously o-egative, so we ca apply Markov s iequality to it to get Pr[Y α 2 ] E[Y] α 2 = Var(X) α 2. CS 70, Sprig 2008, Note 9 4

This completes the proof. Let s apply Chebyshev s iequality to aswer our questio about the radom walk at the begiig of this lecture ote. Recall that X is our positio after steps, ad that E[X] = 0, Var(X) =. Corollary 5.3 says that, for ay β > 0, Pr[ X β ] β 2. Thus for example, if we take = 0 6 steps, the probability that we ed up more tha 0000 steps away from our startig poit is at most 00. Here are a few more examples of applicatios of Chebyshev s iequality (you should check the algebra i them):. Coi tosses. Let X be the umber of Heads i tosses of a fair coi. The probability that X deviates from µ = 2 by more tha is at most 4. The probability that it deviates by more tha 5 is at most 00. 2. Fixed poits. Let X be the umber of fixed poits i a radom permutatio of items; recall that E[X] = Var(X) =. Thus the probability that more tha (say) 0 studets get their ow homeworks after shufflig is at most 00, however large is2. 2 I more detail: Pr[X > 0] = Pr[X ] Pr[ X ] 0] Var(X) 00 = 00. CS 70, Sprig 2008, Note 9 5