n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Similar documents
X = X X n, + X 2

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Lecture 2: Concentration Bounds

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Learning Theory: Lecture Notes

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Problem Set 2 Solutions

2 Definition of Variance and the obvious guess

An Introduction to Randomized Algorithms

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

7.1 Convergence of sequences of random variables

Lecture 6: Coupon Collector s problem

Lecture 12: September 27

CS 330 Discussion - Probability

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

7.1 Convergence of sequences of random variables

The standard deviation of the mean

Stat 400: Georgios Fellouris Homework 5 Due: Friday 24 th, 2017

Lecture Chapter 6: Convergence of Random Sequences

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Discrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview

Lecture 3: August 31

Topic 8: Expected Values

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Lecture 12: November 13, 2018

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Distribution of Random Samples & Limit theorems

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Random Variables, Sampling and Estimation

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Frequentist Inference

1+x 1 + α+x. x = 2(α x2 ) 1+x

Notes for Lecture 11

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Expectation and Variance of a random variable

Lecture 4: April 10, 2013

Lecture 2: April 3, 2013

Understanding Samples

6.3 Testing Series With Positive Terms

Parameter, Statistic and Random Samples

The Random Walk For Dummies

Lecture 2: Monte Carlo Simulation

Design and Analysis of Algorithms

Random Models. Tusheng Zhang. February 14, 2013

Final Review for MATH 3510

MATH/STAT 352: Lecture 15

Topic 9: Sampling Distributions of Estimators

Solutions to Homework 6

4. Partial Sums and the Central Limit Theorem

Ma 530 Infinite Series I

Infinite Sequences and Series

SDS 321: Introduction to Probability and Statistics

Statistics 511 Additional Materials

STAT Homework 1 - Solutions

Massachusetts Institute of Technology

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

Homework 5 Solutions

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Sample Midterm This midterm consists of 10 questions. The rst seven questions are multiple choice; the remaining three

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Topic 9: Sampling Distributions of Estimators

Math 216A Notes, Week 5

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18


CS 171 Lecture Outline October 09, 2008

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Lecture 1 Probability and Statistics

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Lecture 16. Multiple Random Variables and Applications to Inference

2.2. Central limit theorem.

4.3 Growth Rates of Solutions to Recurrences

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

LECTURE 8: ASYMPTOTICS I

Lecture 2 February 8, 2016

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Basics of Probability Theory (for Theory of Computation courses)

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand Final Solutions

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Math 155 (Lecture 3)

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

z is the upper tail critical value from the normal distribution

1 Convergence in Probability and the Weak Law of Large Numbers

Parameter, Statistic and Random Samples

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

How to walk home drunk. Some Great Theoretical Ideas in Computer Science for. Probability Refresher. Probability Refresher.

Mathematical Statistics - MS

Measures of Spread: Standard Deviation

Section 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations

Lecture 7: Properties of Random Samples

Median and IQR The median is the value which divides the ordered data values in half.

Transcription:

CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe step to the left. How far do I expect to have traveled from my startig poit after steps? Deotig a right-move by + ad a left-move by, we ca describe the probability space here as the set of all words of legth over the alphabet {±}, each havig equal probability 2. For istace, oe possible outcome is (+,+,,..., ). Let the r.v. X deote our positio (relative to our startig poit 0) after moves. Thus X = X + X 2 + +X, { + if ith toss is Heads; where X i = otherwise. Now obviously we have E[X] = 0. The easiest rigorous way to see this is to ote that E[X i ] = ( 2 )+( 2 ( )) = 0, so by liearity of expectatio E[X] = i= E[X i] = 0. Thus after steps, my expected positio is 0. But of course this is ot very iformative, ad is due to the fact that positive ad egative deviatios from 0 cacel out. What the above questio is really askig is: What is the expected value of X, our distace from 0? Ufortuately, computig the expected value of X turs out to be a little awkward, due to the absolute value operator. Therefore, rather tha cosider the r.v. X, we will istead look at the r.v. X 2. Notice that this also has the effect of makig all deviatios from 0 positive, so it should also give a good measure of the distace traveled. However, because it is the squared distace, we will eed to take a square root at the ed. Let s calculate E[X 2 ]: E[X 2 ] = E[(X + X 2 + +X ) 2 ] = E[ = X 2 i= i= i + X i X j ] E[Xi 2 ]+ E[X i X j ] I the last lie here, we used liearity of expectatio. To proceed, we eed to compute E[Xi 2] ad E[X ix j ] (for i j). Let s cosider first Xi 2. Sice X i ca take o oly values ±, clearly Xi 2 = always, so E[Xi 2] =. What about E[X i X j ]? Sice X i ad X j are idepedet, it is the case that E[X i X j ] = E[X i ]E[X j ] = 0. Pluggig these values ito the above equatio gives E[X 2 ] = ( )+0 =. For idepedet radom variables X,Y, we have E[XY] = E[X]E[Y]. You are strogly ecouraged to prove this as a exercise, alog similar lies to our proof of liearity of expectatio i a earlier lecture. Note that E[XY] = E[X]E[Y] is false for geeral r.v. s X,Y ; to see this just look at the preset example, where we have E[X 2 i ] E[X i]e[x i ]. CS 70, Sprig 2008, Note 9

So we see that our expected squared distace from 0 is. Oe iterpretatio of this is that we might expect to be a distace of about away from 0 after steps. However, we have to be careful here: we caot simply argue that E[ X ] = E[X 2 ] =. (Why ot?) We will see shortly (see Chebyshev s Iequality below) how to make precise deductios about X from kowledge of E[X 2 ]. For the momet, however, let s agree to view E[X 2 ] as a ituitive measure of spread of the r.v. X. I fact, for a more geeral r.v. with expectatio E[X] = µ, what we are really iterested i is E[(X µ) 2 ], the expected squared distace from the mea. I our radom walk example, we had µ = 0, so E[(X µ) 2 ] just reduces to E[X 2 ]. Defiitio 9. (variace): For a r.v. X with expectatio E[X] = µ, the variace of X is defied to be Var(X) = E[(X µ) 2 ]. The square root Var(X) is called the stadard deviatio of X. The poit of the stadard deviatio is merely to udo the squarig i the variace. Thus the stadard deviatio is o the same scale as the r.v. itself. Sice the variace ad stadard deviatio differ just by a square, it really does t matter which oe we choose to work with as we ca always compute oe from the other immediately. We shall usually use the variace. For the radom walk example above, we have that Var(X) =, ad the stadard deviatio of X is. The followig easy observatio gives us a slightly differet way to compute the variace that is simpler i may cases. Theorem 9.: For a r.v. X with expectatio E[X] = µ, we have Var(X) = E[X 2 ] µ 2. Proof: From the defiitio of variace, we have Var(X) = E[(X µ) 2 ] = E[X 2 2µX + µ 2 ] = E[X 2 ] 2µE[X]+ µ 2 = E[X 2 ] 2µ 2 + µ 2 = E[X 2 ] µ 2. I the third step here, we used liearity of expectatio. Let s see some examples of variace calculatios.. Fair die. Let X be the score o the roll of a sigle fair die. Recall from a earlier lecture that E[X] = 7 2. So we just eed to compute E[X 2 ], which is a routie calculcatio: E[X 2 ] = 6 ( 2 + 2 2 + 3 2 + 4 2 + 5 2 + 6 2) = 9 6. Thus from Theorem 5. Var(X) = E[X 2 ] (E[X]) 2 = 9 6 49 4 = 35 2. 2. Biased coi. Let X the the umber of Heads i tosses of a biased coi with Heads probability p (i.e., X has the biomial distributio with parameters, p). We already kow that E[X] = p. Let { if ith toss is Head, X i = 0 otherwise. CS 70, Sprig 2008, Note 9 2

We ca the write X = X + X 2 + +X, ad the E[X 2 ] = E[(X + X 2 + +X ) 2 ] = i= E[Xi 2 ]+ E[X i X j ] = ( p)+(( ) p 2 ) = 2 p 2 + p( p). I the third lie here, we have used the facts that E[X 2 i ] = p, ad that E[X ix j ] = E[X i ]E[X j ] = p 2 (sice X i,x j are idepedet). Note that there are ( ) pairs i, j with i j. Fially, we get that Var(X) = E[X 2 ] (E[X]) 2 = p( p). As a example, for a fair coi the expected umber of Heads i tosses is 2, ad the stadard deviatio is 2. Notice that i fact Var(X) = i Var(X i ), ad the same was true i the radom walk example. This is o coicidece, ad it depeds o the fact that the X i are mutually idepedet. (Exercise: Verify this.) So i the idepedet case we ca compute variaces of sums very easily. As we shall see i example 3, however, whe the r.v.s are ot mutually idepedet, the variace is ot liearly additive. 3. Number of fixed poits. Let X be the umber of fixed poits i a radom permutatio of items (i.e., the umber of studets i a class of size who receive their ow homework after shufflig). We saw i a earlier{ lecture that E[X] = (regardless of ). To compute E[X 2 ], write X = X + X 2 + +X, if i is a fixed poit; where X i = 0 otherwise. The as usual we have E[X 2 ] = i= E[Xi 2 ]+ E[X i X j ]. () Sice X i is a idicator r.v., we have that E[Xi 2] = Pr[X i = ] =. I this case, however, we have to be a bit more careful about E[X i X j ]: ote that we caot claim as before that this is equal to E[X i ]E[X j ], because X i ad X j are ot idepedet (do you see why ot?). But sice both X i ad X j are idicators, we ca compute E[X i X j ] directly as follows: E[X i X j ] = Pr[X i = X j = ] = Pr[both i ad j are fixed poits] = ( ). [Check that you uderstad the last step here.] Pluggig this ito equatio () we get E[X 2 ] = ( )+(( ) ( ) ) = + = 2. Thus Var(X) = E[X 2 ] (E[X]) 2 = 2 =. I other words, the variace ad the mea are both equal to. Like the mea, the variace is also idepedet of. Ituitively at least, this meas that it is ulikely that there will be more tha a small umber of fixed poits eve whe the umber of items,, is very large. Chebyshev s Iequality We have see that, ituitively, the variace (or, more correctly the stadard deviatio) is a measure of spread, or deviatio from the mea. Our ext goal is to make this ituitio quatitatively precise. What we ca show is the followig: CS 70, Sprig 2008, Note 9 3

Theorem 9.2: [Chebyshev s Iequality] For a radom variable X with expectatio E[X] = µ, ad for ay α > 0, Pr[ X µ α] Var(X) α 2. Before provig Chebyshev s iequality, let s pause to cosider what it says. It tells us that the probability of ay give deviatio, α, from the mea, either above it or below it (ote the absolute value sig), is at most Var(X). As expected, this deviatio probability will be small if the variace is small. A immediate corollary α 2 of Chebyshev s iequality is the followig: Corollary 9.3: For a radom variable X with expectatio E[X] = µ, ad stadard deviatio σ = Var(X), Pr[ X µ βσ] β 2. Proof: Plug α = β σ ito Chebyshev s iequality. So, for example, we see that the probability of deviatig from the mea by more tha (say) two stadard deviatios o either side is at most 4. I this sese, the stadard deviatio is a good workig defiitio of the width or spread of a distributio. We should ow go back ad prove Chebyshev s iequality. The proof will make use of the followig simpler boud, which applies oly to o-egative radom variables (i.e., r.v. s which take oly values 0). Theorem 9.4: [Markov s Iequality] For a o-egative radom variable X with expectatio E[X] = µ, ad ay α > 0, Pr[X α] E[X] α. Proof: From the defiitio of expectatio, we have E[X] = a Pr[X = a] a = a<α a Pr[X = a]+ a Pr[X = a] a Pr[X = a] α Pr[X = a] = α Pr[X α]. The crucial step here is the third lie, where we have used the fact that X takes o oly o-egative values ad cosequetly a 0 ad Pr[X = a] 0. (This step is ot valid otherwise, sice if X ca take o egative values, the we might have a Pr[X = a] < 0.) Now we ca prove Chebyshev s iequality quite easily. Proof of Theorem 5.2: Defie the r.v. Y = (X µ) 2. Note that E[Y] = E[(X µ) 2 ] = Var(X). Also, otice that the probability we are iterested i, Pr[ X µ α], is exactly the same as Pr[Y α 2 ]. (This is because the iequality X µ α is true if ad oly if (X µ) 2 α 2 is true, i.e., if ad oly if Y α 2 is true.) Moreover, Y is obviously o-egative, so we ca apply Markov s iequality to it to get Pr[Y α 2 ] E[Y] α 2 = Var(X) α 2. CS 70, Sprig 2008, Note 9 4

This completes the proof. Let s apply Chebyshev s iequality to aswer our questio about the radom walk at the begiig of this lecture ote. Recall that X is our positio after steps, ad that E[X] = 0, Var(X) =. Corollary 5.3 says that, for ay β > 0, Pr[ X β ] β 2. Thus for example, if we take = 0 6 steps, the probability that we ed up more tha 0000 steps away from our startig poit is at most 00. Here are a few more examples of applicatios of Chebyshev s iequality (you should check the algebra i them):. Coi tosses. Let X be the umber of Heads i tosses of a fair coi. The probability that X deviates from µ = 2 by more tha is at most 4. The probability that it deviates by more tha 5 is at most 00. 2. Fixed poits. Let X be the umber of fixed poits i a radom permutatio of items; recall that E[X] = Var(X) =. Thus the probability that more tha (say) 0 studets get their ow homeworks after shufflig is at most 00, however large is2. 2 I more detail: Pr[X > 0] = Pr[X ] Pr[ X ] 0] Var(X) 00 = 00. CS 70, Sprig 2008, Note 9 5