Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Size: px

Start display at page:

Download "Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018"

Andrew Campbell
5 years ago
Views:

1 Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018

2 Announcements PS5 due today Pain poll PS6 out today Due next Monday 8/13 (1:30pm) (will not be accepted after Wed 8/15) Programming part: Java, C, C++, or Python Lecture Notes published for all material 2

3 Summary from last time! " $ & ' ( for all $ > 0 if " 0 (Markov s inequality)! " -. /0 1 0 for all. > 0 (Chebyshev s inequality) if 2 " = -, Var " = 4 5! " $ /0 / 0 6( 0 for all $ > 0 (One-sided Chebyshev s (and others) if 2 " = 0, Var " = 4 5 inequality) 2 7 " 7 2 " if for all x (Jensen s inequality) 3

4 Bounds on sample mean Consider a sample of n i.i.d random variables X 1, X 2,... X n, where X k has distribution F with E[X k ] = µ and Var(X k ) = s 2 Sample mean!" = $ % % '($ " ' )!" = *, Var!" = +, % 4

5 Summary from last time Weak Law of Large Numbers lim ' () + - = 0 $ & à probability goes to zero in the limit à Will still have a deviation in the limit (at most -) à We have an infinite number of non-zero probabilities Strong Law of Large Numbers ' lim () = + = 1 $ & à Implies WLLN à After some number n, ' () + - = 0 à We have a finite number of non-zero probabilities 5

6 Where are we going with all of this? As engineers, we want to: 1. Model things that can be random. 2. Find average values of situations that let us make good decisions. We perform multiple experiments to help us with engineering. Counts and averages are both sums of RVs. (RVs) (expectation) Weeks 4 and 5: Learning how to model multiple RVs This week (week 6): Finding expectation of complex situations Defining sampling and using existing data Mathematical bounds w.r.t modeling the average Week 7: bringing it all together Sampling + average = Central Limit Theorem Samples + modeling = finding the best model parameters given data 6

7 Goals for today Central Limit Theorem! Confidence intervals, re-defined Estimates for sums of IID RVs Introduction to Parameter estimation 7

8 Central Limit Theorem If! " # = %, Var " # = ' (, Sample means of IID RVs are normally distributed. / )" = 1 +, " # 1 %, '( + #-. Sums of IID RVs are normally distributed. /, " # 1 +%, +' ( #-. 8

CLT explains a lot!~bin # = 200, ( = 0.6 0.

00 70 80 90 100 110 120 130 140 150 x Normal

9 CLT explains a lot!~bin # = 200, ( = P(X = x) x Normal approximation to Binomial when np(1-p) > 10 9

A short history of the CLT 1700 1733: CLT for X ~

Pierre-Simon Laplace extends de Moivre s work to

Alexandr Lyapunov provides precise definition and

the Bottom As of July 4 th, he has a total of 190 songs

10 A short history of the CLT : CLT for X ~ Ber(1/2) postulated by Abraham de Moivre : Pierre-Simon Laplace extends de Moivre s work to approximating Bin(n,p) with Normal : Alexandr Lyapunov provides precise definition and rigorous proof of CLT 2012: Drake releases Started from the Bottom As of July 4 th, he has a total of 190 songs Mean quality of subsamples of songs is normally distributed (thanks to the Central Limit Theorem) 10

11 Implications of CLT Anything that is a sum/average of independent random variables is normal meaning in real life, many things are normally distributed: Exam scores: sum of individual problems Movie ratings: averages of independent viewer scores Polling: Ask 100 people if they will vote for candidate X p 1 = # yes /100 Sum of Bernoulli RVs (each person independently says yes w.p. p) Repeat this process with different groups to get p 1,, p n (different sample statistics) Normal distribution over sample means p k Confidence interval: How likely is it that an estimate for true p is close? 11

12 No more convolution when!! For independent discrete random variables X and Y, and Z = X + Y, $ % & = ( ) + + = & =, $. (0)$ 2 (& 0) =, $. (& 5)$ 2 (5) - 4 p Z is defined as the convolution of p X and p Y. Sums of IID RVs are normally distributed. 9, ) 6 ;!<,!>? 678 (caveat: must be independent and identically distributed) 12

13 CLT vs. Water speed stat - +! = "/,! " 9 :, ;< 6 Water Pokemon speed statistic,! " : #! " = 65, Var! " = Water Pokemon Let Y = average of 50 Water Pokemon. What is the distribution of Y? Solution: * = +! =, - "/, -! " # * = #! " = 65 Var * = Var = 2,3 23 = 10.2 * 9 65,10.2! 13

14 Confidence interval, fully defined Consider a sample of IID RVs X 1, X 2,, X n, where n is large, "# = % & ()% & # (, Var "# = *+ &,- = % & &.% ()% # ( "# -, SE = 2 & def For large n, a 100(1 α)% confidence interval of 3 is defined as: where Φ(4 5/- ) = 1 (>/2). ex: 95% confidence interval: "# 4 5/- 7 SE, "# + 4 5/- 7 SE "# ± 4 5/- 7 (SE) "# ± 1.96(SE) > = 0.05, >/2 = Φ 4 5/- = 0.975, 4 5/- = 1.96 In other words: 95% of time that we sample, true 3 will be in interval "# ± 1.96(SE) NOT: 3 is 95% likely to be in this particular interval. 14!

15 SE with confidence interval Idle CPUs are the bane of our existence. A large company monitors 225 computers (single CPU) for idle hours in order to estimate the average number of idle hours per CPU. Sample had!" = 11.6 hrs, ' ( = hrs 2 (so ' = 4.1 hrs) Give the 90% confidence interval for the estimate of +, mean idle hrs/cpu. Solution:, = 0.10,,/2 = 0.05 Φ 3 4/( = 0.95, 3 4/( = Standard error, SE = 8 = :.; = :.; 9 ((< ;< 90% confidence interval:!" ± 3 4/( (SE) à 11.6 ± :.; 11.15, Interpret: 90% of time that such an interval is computed, the true + is it. ;<! 15

16 Break Attendance: tinyurl.com/cs109summer

17 Sampling distribution CLT tells us sampling distribution of!" is approx. normal when n is large. large n: n > 30 (larger is better): n > 100 We can also use CLT to decide values of n for good confidence intervals 17

18 Estimating clock running time * '( = A,-) (, / Want to test runtime of algorithm. Run algorithm n times and measure average time Know variance is! " = 4 sec 2 Want to estimate mean runtime: % = & How large should n be so that P(average time is within 0.5 of t) = 0.95? Solution: '( = ) *,-) * (, / &, 1 * (by Central Limit Theorem) 2 & 0.5 '( & = 0.95 = '( & 0.5 ( '( &) / 0, 1 * = 2 <.= * " > <.= * " > = ' ( & ~/(0,1) (SD: " * )! 18

19 Estimating clock running time 2 -. = 1 > D EFG. E I %,!" > Want to test runtime of algorithm. Run algorithm n times and measure average time Know variance is! " = 4 sec 2 Want to estimate mean runtime: % = & How large should n be so that P(average time is within 0.5 of t) = 0.95? Solution: ' & & = ' & 0.5 = ' = Φ 2 6 Φ 2 6 Φ = Φ 2 6 = Solve for n*: 1 Φ = G 2 EFG 2. E I &, 6 2 = 2Φ 2 6 What say you, Chebyshev? ' -. & ' -. & 0.5 6/ B 16/> 0.05 > 320 " " 1 = 0.95 = = 1.96 > = 7.84 " = 62 6 Thanks for playing, Pafnuty!! 19

20 The power of the central limit theorem Sample means of IID RVs are normally distributed. *!" = 1 % & '() Can estimate confidence interval Can give probabilities on sample mean " ', -, /0 % Sums of IID RVs are normally distributed. * & " ', %-, %/ 0 Can estimate any sum of IID RVs (if -, / 0 known) '() our next goal 20

21 Crashing website : 7 # 8 < =>, =@ A 890 Let X = number of visitors to a website/minute: X~Poi(100) The server crashes if 120 requests/minute. What is P(crash in next minute)? Solution 1: (exact, using Poisson) Solution 2: (using CLT) Recall: sum of IID Poissons is Poisson CLT says sums are normally distributed (as n à ) Let: : Poi(100)~ 7 B90 Poi(100/=) " # 120 = 7 B90A1 " # 120 " ) = "./ / C D / B Using CLT on discrete sum à continuity correction Attempted solution 3: (using one-sided Chebyshev) " # 120 = " # J # A + L A = A = 0.2 E! = 1 Φ ! 21

22 Summary: Approximations Poisson: Poisson = sum of IID Poissons where n is arbitrarily large Sum of IID RVs with known E[X], Var(X) continuity correction possible continuity correction Normal: (by Central Limit Theorem) Binomial: n trials P(success) = p, Independent trials continuity correction Normal: p medium, np(1 p)>10 Independent trials Poisson: p < 0.05, n > 20 AND/OR slight dependence! 22

23 Another dice game You will roll 10 6-sided dice (X 1, X 2,, X 10 ) Let: X = total value of all 10 dice = X 1 + X X 10 Win if: X 25 or X 45 What is P(win)? Roll!!!!!!! And now the truth (according to CLT)! 23

24 Another dice game You will roll 10 6-sided dice (X 1, X 2,, X 10 ) Let: X = total value of all 10 dice = X 1 + X X 10 Win if: X 25 or X 45 What is P(win)? Solution: WTF:! " 25 or " 45 = 1!(25 < " < 45) By CLT, " = 2 /01 " / approx. 3 45, 47 8 where E " / = 5 = 3.5,Var " / = 7 8 = <= 18 " = 35, 10 35/12 = 350/12 1! 25 < " < 45 1! 25.5 " 44.5 = 1! 8=.=B<= <=C/18 DB<= EE.=B<= <=C/18 <=C/18 1 2Φ = continuity correction! 24

25 Another dice game You will roll 10 6-sided dice (X 1, X 2,, X 10 ) Let: X = total value of all 10 dice = X 1 + X X 10 Win if: X 25 or X 45 What is P(win)? Solution:! " 25 or " 45 = 1!(25 < " < 45) = ! 25

26 Finding Statistics What do you know? Distribution with parameters Sample of population RVs Bootstrapping What statistic can you find? Anything your heart desires Estimates of E[X] or Var(X), confidence interval, p-values E[X] or Var(X) Sum of IID RVs? CLT Probability bounds 26

27 Now What do you know? Distribution with parameters RVs What statistic can you find? Anything your heart desires What if you don t know the parameters of your distribution? 27

28 Parameter estimation def parametric model probability distribution that has parameters θ. Distribution Model Parameters θ Ber(p) Poi(λ) Multinomial(p 1, p 2,, p m ) Unif(α, β) Normal(µ, σ 2 ) Bernoulli Poisson Multinomial Uniform Normal! = #! = $! = # %, # ',, # )! = *, +! =,, - ' etc. Note θ can be a vector of parameters!! 28

29 Why do we care? def Parameter estimation 1. We observe data that has a known model with unknown parameter ". 2. Use data to estimate model parameters as!". Note: estimators!" are RVs estimating parameter " (e.g., sample mean:!" = $% estimates population mean " = &) Point estimate of parameter: Find the best single value for parameter estimate (as opposed to distribution of!") Better understand of process producing data Future predictions based on model Simulation of future processes Machine Learning: Learn model of system Do stuff with model 29

30 Estimator bias def bias of estimator:! "# # unbiased estimator: bias = 0 Example Sample Mean Estimator: '( = ) *,-) Bias:! '( =. Example 2 Sample Variance Estimator: / 0 = ) * *1) 2-) Bias:! / 0 = 3 0 * (, Estimating:. (unbiased estimator) ( 2 '( 0 Estimating: 3 0 (unbiased estimator)! 30

31 Estimator consistency def consistent estimator: lim $ & ' () ) <, = 1 for, > 0 As we get more data, the estimate () should deviate from the true ) by at most a small amount. Actually known as weak consistency Example Sample Mean Estimator: 12 = 3 $ 563 $ 2 5 Estimating: 7 By strong law of large numbers (SLLN): ' lim $ & 12 = 7 = 1 Implying weak law of large numbers (WLLN): lim $ & ' 12 7, = 0 for, > 0 Equivalently: lim ' 12 7 <, = 1 for, > 0 $ & (consistent estimator)! 31

32 Sample variance consistency def biased estimator:! "# # 0 def consistent estimator: lim - *, "# # < / for / > 0 As 1, estimate "# should deviate from true # by at most a small amount. Are the below two estimators for variance 3 4 biased? Consistent? = 7 * *87 :;7 < : =< 4 (sample variance) unbiased:! 5 4 = 3 4 consistent: 5 4 = * *87 7 * :;7 * < 4 : =< 4, 7 - lim *, * :;7 * < 4 : =! < 4 = 1, - lim =< 4 4 = 1 (by SLLN, < : i.i.d.) *, 2. A = 7 * * :;7 < : < 4! biased:! A =! *87 * 54 = *87 * 34 *87 consistent: for similar reasons as above #1, where now lim = 1 *, * 32

Central Theorems Chris Piech CS109, Stanford University

Central Theorems Chris Piech CS109, Stanford University Silence!! And now a moment of silence......before we present......a beautiful result of probability theory! Four Prototypical Trajectories Central