COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

Similar documents
COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

MITOCW ocw f99-lec30_300k

MITOCW MIT18_01SCF10Rec_24_300k

Week 11 Sample Means, CLT, Correlation

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Note: Please use the actual date you accessed this material in your citation.

MITOCW ocw f99-lec05_300k

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Chapter 18. Sampling Distribution Models /51

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Unit 4 Probability. Dr Mahmoud Alhussami

Sampling Distribution Models. Chapter 17

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Statistics and Data Analysis in Geology

MITOCW ocw f99-lec01_300k

MITOCW watch?v=pqkyqu11eta

Chapter 1 Review of Equations and Inequalities

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Physics 509: Bootstrap and Robust Parameter Estimation

MITOCW ocw f99-lec17_300k

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

MITOCW watch?v=vjzv6wjttnc

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Section 20: Arrow Diagrams on the Integers

Discrete Random Variables

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

Discrete Probability. Chemistry & Physics. Medicine

STA Module 4 Probability Concepts. Rev.F08 1

MITOCW ocw f99-lec09_300k

value of the sum standard units

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

MITOCW R11. Double Pendulum System

1. Draw a picture or a graph of what you think the growth of the Jactus might look like.

MITOCW ocw f07-lec37_300k

MITOCW MITRES6_012S18_L22-10_300k

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

ENLARGING AREAS AND VOLUMES

MITOCW ocw-18_02-f07-lec17_220k

Topics in Computer Mathematics

Notes 6 Autumn Example (One die: part 1) One fair six-sided die is thrown. X is the number showing.

MITOCW watch?v=ko0vmalkgj8

Basic Probability. Introduction

Chapter 5. Means and Variances

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

2.3 Estimating PDFs and PDF Parameters

Finish section 3.6 on Determinants and connections to matrix inverses. Use last week's notes. Then if we have time on Tuesday, begin:

MITOCW ocw nov2005-pt1-220k_512kb.mp4

One-sample categorical data: approximate inference

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.

Steve Smith Tuition: Maths Notes

But, there is always a certain amount of mystery that hangs around it. People scratch their heads and can't figure

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Do students sleep the recommended 8 hours a night on average?

Senior Math Circles November 19, 2008 Probability II

Chapter 8: An Introduction to Probability and Statistics

Stat 101: Lecture 12. Summer 2006

Chapter 5. Understanding and Comparing. Distributions

P (A) = P (B) = P (C) = P (D) =

Vectors. Vector Practice Problems: Odd-numbered problems from

Section F Ratio and proportion

Instructor (Brad Osgood)

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

MITOCW MIT18_02SCF10Rec_61_300k

Countability. 1 Motivation. 2 Counting

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

MITOCW MITRES18_006F10_26_0101_300k-mp4

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

What is proof? Lesson 1

MITOCW MITRES18_005S10_DiffEqnsGrowth_300k_512kb-mp4

Sampling Distribution Models. Central Limit Theorem

MITOCW MITRES_6-007S11lec09_300k.mp4

MITOCW watch?v=poho4pztw78

Lesson 5b Solving Quadratic Equations

Probably About Probability p <.05. Probability. What Is Probability?

MITOCW 6. Standing Waves Part I

1 Normal Distribution.

3: Linear Systems. Examples. [1.] Solve. The first equation is in blue; the second is in red. Here's the graph: The solution is ( 0.8,3.4 ).

MITOCW big_picture_derivatives_512kb-mp4

MITOCW ocw f99-lec23_300k

THE SAMPLING DISTRIBUTION OF THE MEAN

Solving with Absolute Value

MITOCW MITRES18_005S10_DerivOfSinXCosX_300k_512kb-mp4

Mechanics, Heat, Oscillations and Waves Prof. V. Balakrishnan Department of Physics Indian Institute of Technology, Madras

Statistical Methods for the Social Sciences, Autumn 2012

Sampling Distributions

Binomial Distribution. Collin Phillips

MITOCW MITRES18_006F10_26_0602_300k-mp4

CHMC: Finite Fields 9/23/17

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Confidence Intervals

Section 5.4. Ken Ueda

MITOCW ocw-18_02-f07-lec25_220k

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Transcription:

COMP6053 lecture: Sampling and the central limit theorem Markus Brede, mb8@ecs.soton.ac.uk

Populations: long-run distributions Two kinds of distributions: populations and samples. A population is the set of all relevant measurements. Think of it as the big picture.

Populations: finite or infinite? A population can have a finite number of outcomes, but an infinite extent. Consider the set of all possible two-dice throws [2,3,4,5,6,7,8,9,10,11,12]. We can ask what the distribution across totals would be if you threw a theoretical pair of dice an infinite number of times.

Populations: finite or infinite? Alternatively, a population can also have an infinite number of outcomes and an infinite extent. Consider a simulation that produced a predicted global average temperature for 2050. The simulation won't give the same result every time it's run: 15.17, 14.81, 15.02, 14.46... We can ask how the prediction values would be distributed across an infinite number of runs of the simulation, each linked to a different sequence of pseudo-random numbers.

Populations: finite or infinite? A population can be finite but large. The set of all fish in the Pacific Ocean. The set of all people currently living in the UK. A population can be finite and small. The set of Nobel prize winners born in Hungary (9). The set of distinct lineages of living things (only 1, that we know of).

Known population distributions Sometimes our knowledge of probability allows us to specify exactly what the infinite long-run distribution of some process looks like. We can illustrate this with a probability density function. In other words, a histogram that describes the probability of an outcome rather than counting occurrences of that outcome. Take the two-dice case...

The need for sampling More commonly, we don't know the precise shape of the population's distribution on some variable. But we'd like to know. We have no alternative but to sample the population in some way. This might mean empirical sampling: we go out into the middle of the Pacific and catch 100 fish in order to learn something about the distribution of fish weights. It might mean sampling from many repeated runs of a simulation.

Samples A sample is just a group of observations drawn in some way from a wider population. Statistics has its roots in the effort to figure out just what you can reasonably infer about this wider population from the sample you've got. The size of your sample turns out to be an important limiting factor.

Sampling from a known distribution How can we learn about the effects of sampling? Let's take a very simple distribution that we understand well: the results from throwing a single die (i.e., the uniform distribution across the integers from 1 to 6 inclusive). We know that the mean of this distribution is 3.500, the variance is 2.917, and the standard deviation is 1.708. Mean = ( 1 + 2 + 3 + 4 + 5 + 6 ) / 6 = 3.5. Variance = ( (1-3.5)^2 + (2-3.5)^2 +... (6-3.5)^2 ) / 6 = 2.917.

Sampling from a known distribution Standard deviation = sqrt(variance) = 1.708. We can simulate drawing some samples from this distribution to see how the size of our sample affects our attempts to draw conclusions about the population. What would samples of size one look like? That would just mean drawing a single variate from the population, i.e., throwing a single die, once.

Some samples A small sample of 3 observations gives a mean of 2.667. A larger sample of 25 observations gives a mean of 3.240.

Samples give us varying results In both cases we didn't reproduce the shape of the true distribution nor get exactly 3.5 as the mean, of course. The bigger sample gave us a more accurate estimate of the population mean which is hopefully not too surprising. But how much variation from the true mean should we expect if we kept drawing samples of a given size? This leads us to the "meta-property" of the sampling distribution of the mean: let's simulate drawing a size 3 sample 10,000 times, calculate the sample mean, and see what that distribution looks like...

Sample distribution of the mean For the sample-size-3 case, it looks like the mean of the sample means centres in on the true mean of 3.5. But there's a lot of variation. With such a small sample size, we can get extreme results such as a sample mean of 1 or 6 reasonably often. Do things improve if we look at the distribution of the sample means of sample of size 25 for example?

Sample distribution of the mean So there are a few things going on here... The distribution of the sample means looks like it is shaped like a bell curve, despite the fact that we've been sampling from a flat (uniform) distribution. The width of the bell curve is getting gradually smaller as the size of our samples go up. So bigger samples seem to give tighter, more accurate estimates. Even for really small sample sizes, like 3, the sample mean distribution looks like it is centred on the true mean, but for a particular sample we could be way off.

Sample distribution of the mean Given our usual tools of means, variances, standard deviations, etc., we might ask how to characterize these sample distributions? It looks like the mean of the sample means will be the true mean, but what will happen to the variance / standard deviation of the sample means? Can we predict, for example, what the variance of the sample mean distribution would be if we took an infinite number of samples of a given size N?

Distribution arithmetic revisited We talked last week about taking the distribution of die-a throws and adding it to the distribution of die-b throws to find out something about two-dice throws. When two distributions are "added together", we know some things about the resulting distribution: The means are additive. The variances are additive. The standard deviations are not additive.

Distribution arithmetic revisited A question: what about dividing and multiplying distributions by constants? How does that work?

Distributional arithmetic revisited Scaling a distribution (multiplying or dividing by some constant) can be thought of as just changing the labels on the axes of the histogram. The mean scales directly. E[cX ]=c E[ X ] This time it's the variance that does not scale directly. V [cx ]=E[(cX) 2 ] E[cX ] 2 =c 2 V [ X ] The standard deviation (in the same units as the mean) scales directly. SD[cX ]= V [cx ]=c SD [X ]

Distributional arithmetic revisited When we calculate the mean of a sample, what are we really doing? For each observation in the sample, we're drawing a score from the true distribution. Then we add those scores together. So the mean and variance will be additive. Then we divide by the size of the sample. So the mean and standard deviation will scale by 1/N.

Some results For the 1-die case: Mean of the sample total will be 3.5 x N. Variance of the sample total will be 2.917 x N. Standard deviation of the total will be sqrt(2.917n). Then we divide through by N... The mean of the sample means will be 3.5 (easy). The variance of the sample means will be 2.917 / N (tricky: have to calculate the SD first). The standard deviation of the sample means will be sqrt(2.917n) / N (easy) which comes out as 1.708 / sqrt(n).

What do we have now? We know that if we repeatedly sample from a population, taking samples of a given size N: The mean of our sample means will converge on the true mean: great news! The standard deviation of our distribution of sample means will tighten up in proportion to 1 / sqrt(n). In other words, accuracy improves with bigger sample sizes, but with diminishing returns. Remember this 1 / sqrt(n) ratio; it's related to something called the standard error which we'll come back to.

What do we have now? We also have a strong hint that the distribution of our sample means will itself take on a normal or bell curve shape, especially as we increase the sample size. This is interesting because of course the population distribution in this case was uniform: the results from throwing a single die many times do not look anything like a bell curve.

An unusual distribution How strong is this tendency for the sample means to be themselves normally distributed? Let's take a deliberately weird distribution that is as far from normal as possible and simulate sampling from it...

Central limit theorem The central limit theorem states that the mean of a sufficiently large number of independent random variables will itself be approximately normally distributed. Let's look at the distribution of the sample means for our strange distribution, given increasing sample sizes. At first glance, given its tri-modal nature, it's not obvious how we're going to get a normal (bell-shaped) distribution out of this.

Central limit theorem We do reliably get a normal distribution when we look at the distribution of sample means, no matter how strange the original distribution that we were sampling from. This surprising result turns out to be very useful in allowing us to make inferences about populations from samples. Python code for the graphs and distributions in this lecture.

Central limit theorem more formally Consider a set of identically distributed random variables X i with zero mean and variance 2. Then we have: Where: X 1 +...+ X n n is the normal distribution. N (0,σ 2 ) N (μ,σ 2 )= 1 2 2 π σ exp( (x μ) ) 2 2σ 2 Remarks: Can always subtract mean... so this is general enough Convergence is in distribution, i.e. not uniform in centre and tails! (Chernoff's bound, Berry-Esseen theorem) Finite variance required here... other versions available

Central limit theorem more formally Consider a set of identically distributed random variables X i with zero mean and variance 2. Then we have: Where: X 1 +...+ X n n is the normal distribution. N (0,σ 2 ) N (μ,σ 2 )= 1 2 2 π σ exp( (x μ) ) 2 2σ 2 Remarks: Can always subtract mean... so this is general enough Convergence is in distribution, i.e. not uniform in centre and tails! Finite variance required here... other versions available

Central limit theorem Why? The normal distribution has some special properties, e.g.: X 1 N (μ 1, σ 1 2 ), X 2 N (μ 2, σ 2 2 ) X 1 + X 2 N (μ 1 +μ 2, σ 1 2 +σ 2 2 ) X N (μ, σ 2 ) cx N (cμ, c 2 σ 2 ) One can even recover the normal distribution from these properties, i.e. N (0,1)+N (0,1)= 2 N (0,1) defines the normal distribution (up to scaling). Now, in the CLT, consider convergence to some hypothetical distribution D X 1 +...+ X n n D X 1 +... X n + X n+1 +...+ X 2n 2 n D Hence we expect D+ D= 2 D the limiting distribution to be normal

Central limit theorem Why? So... it is easy to see that convergence would happen to a normal distribution What is not quite so easy to see is that convergence takes place at all. The proof is best done via generating functions, but we won't do it here. Useful to know there are generalised versions of the CLT for cases when: The Xi's are not identically distributed The variance is infinite