The Central Limit Theorem

Similar documents
The Central Limit Theorem

Lecture 8 Sampling Theory

The normal distribution

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

One-sample categorical data: approximate inference

The normal distribution

Lecture 4: Random Variables and Distributions

MAT Mathematics in Today's World

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Class 8 Review Problems 18.05, Spring 2014

Chapter 18 Sampling Distribution Models

3 Multiple Discrete Random Variables

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

6 The normal distribution, the central limit theorem and random samples

Descriptive statistics

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture 1: August 28

Last few slides from last time

Lecture 7: Confidence interval and Normal approximation

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

(Re)introduction to Statistics Dan Lizotte

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Lecture 2: Repetition of probability theory and statistics

Jointly Distributed Random Variables

Sampling Distribution Models. Chapter 17

Probability theory basics

Algorithms for Uncertainty Quantification

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Lecture 4: September Reminder: convergence of sequences

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Review of probability. Nuno Vasconcelos UCSD

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

Recitation 2: Probability

Probability. Table of contents

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

5.2 Fisher information and the Cramer-Rao bound

Review for Exam Spring 2018

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

1 Random variables and distributions

STAT Chapter 5 Continuous Distributions

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

Statistics 427: Sample Final Exam

Mathematical statistics

IV. The Normal Distribution

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Statistics and Sampling distributions

CS 361: Probability & Statistics

Probability and Probability Distributions. Dr. Mohammed Alahmed

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

Two-sample inference: Continuous data

Week 2: Review of probability and statistics

Count data and the Poisson distribution

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Power and sample size calculations

Single Maths B: Introduction to Probability

CMPT 882 Machine Learning

SDS 321: Introduction to Probability and Statistics

Intro to probability concepts

ST 371 (IX): Theories of Sampling Distributions

Compute f(x θ)f(θ) dθ

This does not cover everything on the final. Look at the posted practice problems for other topics.

MATH Notebook 5 Fall 2018/2019

CS 361: Probability & Statistics

Summary statistics, distributions of sums and means

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

6.041/6.431 Fall 2010 Quiz 2 Solutions

Name: Firas Rassoul-Agha

We introduce methods that are useful in:

Part 3: Parametric Models

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Introduction to Statistical Inference

Extra Topic: DISTRIBUTIONS OF FUNCTIONS OF RANDOM VARIABLES

HT Introduction. P(X i = x i ) = e λ λ x i

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

STA 111: Probability & Statistical Inference

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Counting principles, including permutations and combinations.

Lecture 2: Review of Probability

Joint Probability Distributions and Random Samples (Devore Chapter Five)

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Statistical Methods for the Social Sciences, Autumn 2012

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Chapter 5. Means and Variances

II. The Normal Distribution

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Transcription:

The Central Limit Theorem Patrick Breheny September 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 31

Kerrich s experiment Introduction 10,000 coin flips Expectation and variance of sums A South African mathematician named John Kerrich was visiting Copenhagen in 1940 when Germany invaded Denmark Kerrich spent the next five years in an interment camp To pass the time, he carried out a series of experiments in probability theory One of them involved flipping a coin 10,000 times Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 2 / 31

The law of averages? Introduction 10,000 coin flips Expectation and variance of sums We know that a coin lands heads with probability 50% So, loosely speaking, if we flip the coin a lot, we should have about the same number of heads and tails The subject of today s lecture, though is to be much more specific about exactly what happens and precisely what probability theory tells us about what will happen in those 10,000 coin flips Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 3 / 31

Kerrich s results Introduction 10,000 coin flips Expectation and variance of sums Number of Number of Heads - tosses (n) heads 0.5 Tosses 10 4-1 100 44-6 500 255 5 1,000 502 2 2,000 1,013 13 3,000 1,510 10 4,000 2,029 29 5,000 2,533 33 6,000 3,009 9 7,000 3,516 16 8,000 4,034 34 9,000 4,538 38 10,000 5,067 67 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 4 / 31

Kerrich s results plotted 10,000 coin flips Expectation and variance of sums Number of heads minus half the number of tosses 100 50 0 50 100 10 100 400 1000 2000 4000 7000 10000 Number of tosses Instead of getting closer, the numbers of heads and tails are getting farther apart Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 5 / 31

Repeating the experiment 50 times 10,000 coin flips Expectation and variance of sums Number of heads minus half the number of tosses 100 50 0 50 100 10 100 400 1000 2000 4000 7000 10000 Number of tosses This is not a fluke instead, it occurs systematically and consistently in repeated simulated experiments Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 6 / 31

Where s the law of averages? 10,000 coin flips Expectation and variance of sums As the figure indicates, simplistic notions like we should have about the same number of heads and tails are inadequate to describe what happens with long-run probabilities We must be more precise about what is happening in particular, what is getting more predictable as the number of tosses goes up, and what is getting less predictable? Consider instead looking at the percentage of flips that are heads Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 7 / 31

10,000 coin flips Expectation and variance of sums Repeating the experiment 50 times, Part II 70 Percentage of heads 60 50 40 30 10 100 400 1000 2000 4000 7000 10000 Number of tosses Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 8 / 31

What happens as n gets bigger? 10,000 coin flips Expectation and variance of sums We now turn our attention to obtaining a precise mathematical description of what is happening to the mean (i.e, the proportion of heads) with respect to three trends: Its expected value Its variance Its distribution First, however, we need to define joint distributions and prove a few theorems about the expectation and variance of sums Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 9 / 31

Joint distributions Introduction 10,000 coin flips Expectation and variance of sums We can extend the notion of a distribution to include the consideration of multiple variables simultaneously Suppose we have two discrete random variables X and Y. Then the joint probability mass function is a function f : R 2 R defined by f(x, y) = P (X = x, Y = y) Likewise, suppose we have two continuous random variables X and Y. Then the joint probability density function is a function f : R 2 R satisfying P ((X, Y ) A) = f(x, y) dx dy for all rectangles A R 2 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 10 / 31 A

Marginal distributions 10,000 coin flips Expectation and variance of sums Suppose we have two discrete random variables X and Y. Then the pmf f(x) = y f(x, y) is called the marginal pmf of X Likewise, if X and Y are continuous, we integrate out one variable to obtain the marginal pdf of the other: f(x) = f(x, y) dy Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 11 / 31

Expectation of a sum Introduction 10,000 coin flips Expectation and variance of sums Theorem: Let X and Y be random variables. Then E(X + Y ) = E(X) + E(Y ) provided that E(X) and E(Y ) exist Note in particular that X and Y do not have to be independent for this to work Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 12 / 31

Variance of a sum Introduction 10,000 coin flips Expectation and variance of sums Theorem: Let X and Y be independent random variables. Then Var(X + Y ) = Var(X) + Var(Y ) provided that Var(X) and Var(Y ) exist Note that X and Y must be independent for this to work Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 13 / 31

The expected value of the mean Expected value Variance The distribution of the average Theorem: Suppose X 1, X 2,... X n are random variables with the same expected value µ, and let X denote the mean of all n random variables. Then E( X) = µ In other words, for any value of n, the expected value of the sample mean is expected value of the underlying distribution When an estimator ˆθ has the property that E(ˆθ) = θ, the estimator is said to be unbiased Thus, the above theorem shows that the sample mean is an unbiased estimator of the population mean Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 14 / 31

Expected value Variance The distribution of the average Applying this result to the coin flips example Theorem: For the binomial distribution, E(X) = nθ Thus, letting ˆθ = X/n, E(ˆθ) = θ, which is exactly what we saw in the earlier picture: 70 Percentage of heads 60 50 40 30 10 100 400 1000 2000 4000 7000 10000 Number of tosses Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 15 / 31

The variance of the mean Expected value Variance The distribution of the average Our previous result showed that the sample mean is always close to the expected value, at least in the sense of being centered around it Of course, how close it is also depends on the variance, which is what we now consider Theorem: Suppose X 1, X 2,... X n are independent random variables with expected value µ and variance σ 2. Letting X denote the mean of all n random variables, Corollary: SD( X) = σ/ n Var( X) = σ2 n Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 16 / 31

The square root law Introduction Expected value Variance The distribution of the average To distinguish between the standard deviation of the data and the standard deviation of an estimator (e.g., the mean), estimator standard deviations are typically referred to as standard errors As the previous slide makes clear, these are not the same, and are related to each other by a very important way, sometimes called the square root law: SE = SD n This is true for all averages, although as we will see later in the course, must be modified for other types of estimators Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 17 / 31

Standard errors Introduction Expected value Variance The distribution of the average Note that Var( X) goes up with n, while Var( X) goes down with n, exactly as we saw in our picture from earlier: Number of heads minus half the number of tosses 100 50 0 50 100 50 40 30 10 100 400 1000 2000 4000 7000 10000 Number of tosses Percentage of heads 70 60 10 100 400 1000 2000 4000 7000 10000 Number of tosses Indeed, Var( X) actually goes down all the way to zero as n Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 18 / 31

The distribution of the mean Expected value Variance The distribution of the average Finally, let s look at the distribution of the mean by creating histograms of the mean from our 50 simulations 12 n=10 10 n=40 12 n=200 10 n=800 10 8 10 8 Frequency 8 6 4 Frequency 6 4 Frequency 8 6 4 Frequency 6 4 2 2 2 2 0 0 0 0 0.2 0.4 0.6 0.8 0.35 0.45 0.55 0.65 0.45 0.50 0.55 0.48 0.50 0.52 Proportion (Heads) Proportion (Heads) Proportion (Heads) Proportion (Heads) Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 19 / 31

(informal) The theorem How good is the CLT approximation? In summary, there are three very important phenomena going on here concerning the sample average: #1 The expected value is always equal to the population average #2 The standard error is always equal to the population standard deviation divided by the square root of n #3 As n gets larger, its distribution looks more and more like the normal distribution Furthermore, these three properties of the sample average hold for any distribution Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 20 / 31

(formal) The theorem How good is the CLT approximation? Central limit theorem: Suppose X 1, X 2,... X n are independent random variables with expected value µ and variance σ 2. Letting X denote the mean of all n random variables, n X µ σ d N(0, 1) The notation d is read converges in distribution to, and means that the limit as n of the CDF of the quantity on the left is equal to the CDF on the right (at all points x where the CDF on the right is continuous) Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 21 / 31

The theorem How good is the CLT approximation? Graphical idea of convergence in distribution For the binomial distribution (red=normal, blue= n( X µ)/σ): F(x) 0.0 0.2 0.4 0.6 0.8 1.0 n=1024 3 2 1 0 1 2 3 x Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 22 / 31

Power of the central limit theorem The theorem How good is the CLT approximation? This result is one of the most important, remarkable, and powerful results in all of statistics In the real world, we rarely know the distribution of our data But the central limit theorem says: we don t have to Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 23 / 31

Power of the central limit theorem The theorem How good is the CLT approximation? Furthermore, as we have seen, knowing the mean and standard deviation of a distribution that is approximately normal allows us to calculate anything we wish to know with tremendous accuracy and the distribution of the mean is always approximately normal The caveat, however, is that for any finite sample size, the CLT only holds approximately How good is this approximation? It depends... Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 24 / 31

How large does n have to be? The theorem How good is the CLT approximation? Rules of thumb are frequently recommended that n = 20 or n = 30 is large enough to be sure that the central limit theorem is working There is some truth to such rules, but in reality, whether n is large enough for the central limit theorem to provide an accurate approximation to the true distribution depends on how close to normal the population distribution is, and thus must be checked on a case-by-case basis If the original distribution is close to normal, n = 2 might be enough If the underlying distribution is highly skewed or strange in some other way, n = 50 might not be enough Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 25 / 31

Example #1 Introduction The theorem How good is the CLT approximation? n=10 0.20 0.5 0.15 0.4 Density 0.10 Density 0.3 0.2 0.05 0.1 0.00 0.0 6 4 2 0 2 4 6 x 3 2 1 0 1 2 3 Sample means Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 26 / 31

Example #2 Introduction The theorem How good is the CLT approximation? Now imagine an urn containing the numbers 1, 2, and 9: n=20 Density 0.8 0.6 0.4 0.2 0.0 2 3 4 5 6 7 Sample mean n=50 Density 0.8 0.6 0.4 0.2 0.0 2 3 4 5 6 Sample mean Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 27 / 31

Example #3 Introduction The theorem How good is the CLT approximation? Weight tends to be skewed to the right (more people are overweight than underweight) Let s perform an experiment in which the NHANES sample of adult men is the population I am going to randomly draw twenty-person samples from this population (i.e. I am re-sampling the original sample) Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 28 / 31

Example #3 (cont d) Introduction The theorem How good is the CLT approximation? n=20 0.04 0.03 Density 0.02 0.01 0.00 160 180 200 220 240 260 Sample mean Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 29 / 31

Why do so many things follow normal distributions? We can see now why the normal distribution comes up so often in the real world: any time a phenomenon has many contributing factors, and what we see is the average effect of all those factors, the quantity will follow a normal distribution For example, there is no one cause of height thousands of genetic and environmental factors make small contributions to a person s adult height, and as a result, height is normally distributed On the other hand, things like eye color, cystic fibrosis, broken bones, and polio have a small number of (or a single) contributing factors, and do not follow a normal distribution Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 30 / 31

Introduction E(X + Y ) = E(X) + E(Y ) Var(X + Y ) = Var(X) + Var(Y ) if X and Y are independent Central limit theorem: The expected value of the average is always equal to the population average SE = SD/ n As n gets larger, the distribution of the sample average looks more and more like the normal distribution Generally speaking, the sampling distribution looks pretty normal by about n = 20, but this could happen faster or slower depending on the underlying distribution, in particular by how skewed it is Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 31 / 31