continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with IID indicator variables X, X 2,... denoting whether the trial was a success or failure. Suppose the probability of success is p. So, Let A n = X +...+X n n E[X i ] = p Var(X i ) = p( p). be the sample average over n trials. E [ ] ( ) p( p) A n = p Var An =. n By Chebyshev s inequality, for any ε > 0 P { An p ε p( p) nε 2. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 3 / continued : Coin tossing : Coin tossing continued. The variance σ 2 = p( p) has a maximum value of 4 achieved at p = 2 : 5 0 5 We have two coins: one is fair and the other produces heads with probability 3/4. One coin is picked at random. How many tosses suffice for us to be 95 percent sure of which coin we had? 0 0.05 Plugging back into Chebyshev gives us a bound on the deviation of the sample average from the mean: P { A n p ε p 2 p( p) nε 2 4nε 2. To make this problem more concrete: if the proportion of heads is less than 0.625, then we will guess the coin was fair; otherwise, if the proportion of heads is greater than 0.625 we will guess the biased coin. How many tosses will suffice for 95 percent certainty that the generated sample average will not deviate by more than ɛ = 25 from its mean? Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 4 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 5 /
continued : Coin tossing continued : Coin tossing By Chebyshev s inequality: P { An p ε 4nε 2. We want to n large enough so that we have only 5% error: equivalently, n 4nε 2 0.05 4(0.05)ε 2 = 5 ε 2 We now have a bound on the number of trials needed without needing to know the mean or the variance. For ε = 25 choose n so that n 5 (25) 2 = 320 By tossing the coin n 320 times we can be 95% certain the sample average is within 25 of the true bias p of the coin to heads: P { An p 25 0.05 Toss the coin 320 times and count heads. If fewer than 200 heads appear guess the fair coin. If more than 200 heads appear guess the biases coin. If exactly 200 heads appears, then laugh at your (bad?) luck. You can be 95 percent certain you chose the right coin. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 6 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 7 / : Coin tossing Degree of Certainty vs. Number of Trials To achieve certainty p that we are within ε = 25 of the mean requires n trials, where n 4( p)(25) 2 Sums of Random Variables Let X, X 2,... be IID (independent and identically distributed) random variables with a common mean µ and variance σ 2. Let S n = X + X 2 +... + X n, so the statistics for S n are E[S n ] = nµ Var(S n ) = nσ 2 StDev(S n ) = nσ So S n is tending to shift to the right and flatten-out as n. Degree of Certainty Number of Trials 50% 32 75% 64 90% 60 95% 320 99% 600 99.9% 6,000 5 0 0.05 0.00 0 20 40 60 80 00 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 8 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 0 /
Standardization Proof of Proposition We standardize the sums S n to guarantee they have the same mean and variance. Definition Let X, X 2,... be IID random variables with a common mean µ and variance σ 2. Let S n = X + X 2 +... + X n. The standardization of S n is the random variable Proposition S n = S n nµ nσ The statistics of the standardization S n of S n are E[S n] = 0 Var(S n) = for any n. Let X, X 2,... be IID random variables with a common mean µ and variance σ 2. Let S n = X + X 2 +... + X n, so that the statistics of S n are E[S n ] = nµ Var(S n ) = nσ 2 The standardization S n is defined as so its statistics are S n = S n nµ nσ E[S n] = E[S n] nµ nσ = 0 Var(S n) = Var ( S n nµ nσ ) = Var(S n ) nσ 2 =. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 2 / Convergence of the Theorem (, CLT) Let X, X 2,... be a sequence of IID random variables having mean µ and variance σ 2. Let S n = X + X 2 +... + X n and S n be its standardization S n = S n nµ nσ Then the distribution of the random variable S n tends to the standard normal distribution as n. That is, for any < a < P {S n a Φ(a) = 2π a e x 2 /2 dx as n. The CLT only states that for each a P {S n a Φ(a), where Φ(a) is the standard normal distribution. CLT leaves open a couple important issues How large n must be for Φ(a) to be close to P {S n a, 2 Is there a single n which works for all a; or, will n vary with each a? It would be very BAD if it turned-out n was usually very large, or even if the tightness of the approximation for a choice of n depended on a. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 3 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 4 /
Convergence of the Fortunately, this does not happen in MOST circumstances. The following result states that the convergence in CLT is on the order of n independently of a. The Berry-Esseen Theorem states If X, X 2,... are IID random variables with finite mean µ, variance σ 2, and third moment E[ X i 3 ], then there is a constant C (which does not depend on a or n) such that for any real a and integer n P {S n a Φ(a) C n. The rule of thumb is that the central limit theorem provides a good approximation when n 30. Binomial Distribution p = 0.5 for Bernoulli random variable with p = 0.5. n n 25 4 n 5 n 50 4 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 5 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 6 / Binomial Distribution p = for Bernoulli random variable with p =. 4 n n 25 4 n 5 n 50 Arbitrary Discrete Distribution for an arbitrary discrete distribution ( ) x 0 2 3 4 X = p 0.4 4 0.5 0.4 n n 25 n 5 0.4 n 50 4 2 2 4 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 7 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 8 /
Arbitrary Discrete Distribution for an arbitrary discrete distribution ( ) x 0 2 3 4 X = p 0.46 0.04 0.02 0.02 0.46 4 0.8 0.6 0.4 n 0.4 n 25 4 0.8 0.6 0.4 n 5 n 50 Standardized Continuous Density Let X, X 2,... be IID continuous random variables with common mean µ and variance σ. We can compute the density f Sn (x) for the sums S n = X + X 2 +... + X n using the convolution. To compute the density f S n (x) for standardized sum S n S n = S n nµ nσ use Theorem 5.7. (Ross, page 243): F S n (x) = F Sn ( x nσ + nµ ) f S n (x) = nσ f Sn ( x nσ + nµ ). Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 9 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 20 / Uniform Distribution. Let X, X 2,... be IID exponential random variables with common mean parameter λ. So, for the uniform distribution on [0, ] plotted against the standard normal distribution. E[X i ] = λ Var(X i ) = λ 2. n n 2 Recall that the sum of n exponential random variables with parameter λ is a gamma distributed random variable with parameters (n, λ) (Section 6.3 of Ross). The density for S n in this case is f Sn (x) = λe λx (λx) n (n )! n f S n (x) = λ f ( nx + n ) S n. λ Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 2 / n 5 n 0 4 4 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 22 /
Exponential Distribution for the exponential distribution with parameter λ = plotted against the standard normal distribution. 4 0.4 0.4 n n 3 n 0 4 n 30 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 23 / Fifty numbers are rounded-off to the nearest integer and the summed. Suppose that the individual round-off errors are uniformly distributed over ( 0.5, 0.5). What is the probability that the round-off error exceeds the exact sum by more than 3? Solution. Let X i (i =,..., 50) be the round-off error on the i number, and X = X + X 2 +... + X 50 the total round-off error. The X i are IID and uniformly distributed, so E[X i ] = 0 Var(X i ) = 2 E[X] = 0 Var(X) = 50 2 Apply CLT to approximate the probability P { X > 3 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 25 / continued First standardize E, then apply the normal approximation P { E > 3 = P { X > 3 0 50/2 = P { X > 6 3 5 2 = P {X > 6 3 5 2 + P {X < 6 3 5 2 2 2 Φ ( 6 3 5 ) 2 2 2(0.9292) = 46. A student s grade is the average of 30 assignments, where each assignment is recorded as an integer out of 00 possible points. Suppose that the instructor makes an error in grading of ±k with probability ε/k, where k 5 and ε = 20. The distribution of errors is ( ) k 0 ± ±2 ±3 ±4 ±5 p 463 600 20 40 60 80 00 The final grade is obtained by averaging the 30 assignment grades. What is the probability that the difference between the correct average grade" and the recorded average grade differs by less than 0.05 for a given student? Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 26 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 27 /
continued continued Let X i (i =, 2,..., 30) be the error between the actual score on the ith assignment and the recorded score. We will assume the errors are independent. Let X = X + X 2 +... + X 30, the sum of the errors. The statistics are computed from the distribution E[X i ] = 0 Var(X i ) =.5 E[X] = 0 Var(X) = (30).5 = 45. Apply CLT to approximate the probability P { 0.05 < X 30 < 0.05. First standardize X, then apply the normal approximation. P { 0.05 < X < 0.05 30 = P {.5 < X <.5 = P {.5 < X 0 <.5 45 45 45 Φ (.5 ) (.5) Φ 45 45 Φ(2) Φ( 2) = 2 Φ(2) 2(0.587) = 742. Thus, there is a 7.4% chance that the student s assignment average is accurate to within 0.05. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 28 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 29 / continued. What is the probability that no error is made? That is, approximate P {X = 0 where X = X +... + X 30. Apply CLT, using continuity correction, since X is a discrete random variable and we want to compute the probability at a possible value. P {X = 0 = P { 0.5 X 0.5 = P { 0.5 0 45 X 0 45 0.5 0 45 Φ ( 0.5 ) ( 0.5) Φ 45 45 2 Φ(0.07) 2(0.5279) = 0.0558 Based on data of similar bridges, the span of a certain bridge can withstand a load, without structural damage, that is normally distributed with mean 400 and standard deviation 40 (in units of 000 pounds). Suppose the weight of a car is a random variable with mean 3 and standard deviation (in units of 000 pounds). Approximately how many cars would have to be on the bridge span for the probability of structural damage to exceed 0%? There is only about a 5.6% that the recorded grade is correct. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 30 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 3 /
continued Let X, X 2,... be random variables denoting the weight of each car on the bridge, and let S n = X + X 2 +... + X n. Let Y denote the load the bridge can withstand. Then (where units are in 00 pounds) E[X i ] = 3 SD(X i ) = E[S n ] = 3n SD(S n ) = n E[Y ] = 400 SD(Y ) = 40 We want to find n so that the probability P {Y S n equivalently P {0 S n Y Assume S n and Y are independent, so µ n = E[S n Y ] = 3n 400 σ n = SD(S n Y ) = () 2 n + (40) 2 continued First standardize, then apply the normal approximation: P {0 S n Y = P { 0 µ n S n Y µ n σ n σ n Φ ( µ n ) (µ n ) = Φ σ n σ n Since Φ(x) increases as x increases, it is sufficient to have or equivalently µ n σ n.29 3n 400 ()2 n + (40) 2.29 Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 32 / Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 33 / continued 3n 400 ()2 n + (40) 2.29 This reduces to a quadratic equation: 9n 2 2400n + 57, 337 0 The smallest value of n satisfying this equation is n = 7. If there are 7 or more cars on the bridge, then there is a greater than 0% chance the load on the bridge is greater then it can withstand without damage. Kenneth Harris (Math 425) Math 425 Intro to Probability Lecture 37 April 8, 2009 34 /