The Central Limit Theorem

Similar documents
Estimating a population mean

How to mathematically model a linear relationship and make predictions.

Sampling Variability and Confidence Intervals. John McGready Johns Hopkins University

Chapter 15 Sampling Distribution Models

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

ECON 4130 Supplementary Exercises 1-4

STATISTICS AND BUSINESS MATHEMATICS B.com-1 Private Annual Examination 2015

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Key Concept. Properties. February 23, S6.4_3 Sampling Distributions and Estimators

Section B. The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample

Estimation and Confidence Intervals

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Math493 - Fall HW 4 Solutions

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

EQ: What is a normal distribution?

Continuous Probability Distributions

At this point, if you ve done everything correctly, you should have data that looks something like:

Modern Methods of Data Analysis - WS 07/08

20 Hypothesis Testing, Part I

Practice Questions for Final

ST 371 (IX): Theories of Sampling Distributions

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Sampling Distribution: Week 6

Statistics and parameters

Section 8.1: Interval Estimation

A two-phase sampling scheme and πps designs

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Continuous Probability Distributions

Contents. 22S39: Class Notes / October 25, 2000 back to start 1

How to mathematically model a linear relationship and make predictions.

WISE International Masters

+ Specify 1 tail / 2 tail

Statistics and Sampling distributions

Part 4: Multi-parameter and normal models

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

1 The Normal approximation

Lecture 3: Chapter 3

EECS 70 Discrete Mathematics and Probability Theory Fall 2015 Walrand/Rao Final

AMS 5 NUMERICAL DESCRIPTIVE METHODS

Scales of Measuement Dr. Sudip Chaudhuri

( ) P A B : Probability of A given B. Probability that A happens

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow:

ECE 510 Lecture 6 Confidence Limits. Scott Johnson Glenn Shirley

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

BTRY 4090: Spring 2009 Theory of Statistics

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

T has many other desirable properties, and we will return to this example

1 Random walks and data

Lecture 8 Sampling Theory

Scientific Measurement

What does a population that is normally distributed look like? = 80 and = 10

8/28/2017. PSY 5101: Advanced Statistics for Psychological and Behavioral Research 1

Introduction to Error Analysis

MAT Mathematics in Today's World

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

The Central Limit Theorem

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

The Central Limit Theorem

Answers Part A. P(x = 67) = 0, because x is a continuous random variable. 2. Find the following probabilities:

Sampling Distribution Models. Chapter 17

Stat 315: HW #6. Fall Due: Wednesday, October 10, 2018

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

How Monte Carlo Sampling Contributes to Data Analysis. Outline

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Advanced Statistical Modelling

Definition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics

STAT 830 Non-parametric Inference Basics

Gov 2000: 9. Regression with Two Independent Variables

TOPIC 12: RANDOM VARIABLES AND THEIR DISTRIBUTIONS

Part I. Experimental Error

CHAPTER 6: SPECIFICATION VARIABLES

Point Estimation and Confidence Interval

Modern Methods of Data Analysis - WS 07/08

Explore the data. Anja Bråthen Kristoffersen

Math 140 Introductory Statistics

Stat 139 Homework 2 Solutions, Spring 2015

Business Statistics. Lecture 10: Course Review

Investigation of Possible Biases in Tau Neutrino Mass Limits

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

Tutorial on Markov Chain Monte Carlo Simulations and Their Statistical Analysis (in Fortran)

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

Men. Women. Men. Men. Women. Women

STA 218: Statistics for Management

Computational modeling

2.3 Estimating PDFs and PDF Parameters

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Bootstrapping, Permutations, and Monte Carlo Testing

Agenda: Questions Chapter 4 so far?

Business Statistics. Lecture 5: Confidence Intervals


June 12 to 15, 2011 San Diego, CA. for Wafer Test. Lance Milner. Intel Fab 12, Chandler, AZ

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture 10: Comparing two populations: proportions

Transcription:

Introductory Statistics Lectures The Central Limit Theorem Sampling distributions Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author 2009 (Compile date: Tue May 19 14:49:45 2009) Contents 1 The Central Limit Theorem 1 1.1 Introduction....... 1 R tip of the day: random number generators.... 1 Population means and sample means.. 2 1.2 Sampling distributions. 3 1.3 Central Limit Theorem: sampling dist. of x............. 5 1.4 Summary........ 7 1.5 Additional Examples.. 7 1 The Central Limit Theorem 1.1 Introduction R TIP OF THE DAY: RANDOM NUMBER GENERATORS Normal random numbers: x=rnorm(n, mean=0, sd=1) Creates n random numbers from a normal distribution and stores them in x. In R, random number generators have a r prefix. Random number generators are useful for simulating random variables and exploring complex systems using Monte Carlo methods. R Command 1

2 of 8 1.1 Introduction Example 1. Generate 3 sets of 8 random random numbers from a normally distributed population with a mean of 10 and standard deviation of 2. R: rnorm ( 8, mean = 10, sd = 2) [ 1 ] 9. 6562 6. 9938 8. 3588 8. 1606 9. 8177 9. 5088 10. 8508 [ 8 ] 6.4181 R: rnorm ( 8, mean = 10, sd = 2) [ 1 ] 7. 0716 6. 9356 6. 6494 12. 9266 12. 0327 10. 1264 12. 6311 [ 8 ] 10.2955 R: rnorm ( 8, mean = 10, sd = 2) [ 1 ] 9. 6340 8. 3440 7. 1764 9. 1673 7. 3618 10. 7930 7. 5256 [ 8 ] 12.5522 Comparison of histogram of 5,000 randomly generated numbers to density function. R: x. rand = rnorm ( 5000, mean = 10, sd = 2) R: curve ( dnorm ( x, mean = 10, sd = 2), 0, 16, l t y = dashed, + ylab = f ( x ), main = Histogram o f random numbers vs t h e o r e t i c a l d e n s i t y ) R: h i s t ( x. rand, prob = T, add = T) Histogram of random numbers vs theoretical density f(x) 0.00 0.05 0.10 0.15 0.20 0 5 10 15 x POPULATION MEANS AND SAMPLE MEANS College aged women s systolic blood pressures (in mm Hg) are normally distributed with a mean of 114.8 and a standard deviation of 13.1. Question 1. If we randomly measured 10 women in this subgroup, would the mean blood pressure be 114.8? Question 2. If we conducted another random sample of 10 women would the mean blood pressure be the same?

The Central Limit Theorem 3 of 8 Question 3. If we repeatedly sample all the possible combinations of 10 women what would the distribution of sample means look like? Sample means Take 5 random samples each with 5 women from the subgroup and computing the mean to estimate the population mean. x = {104, 116, 119, 115, 110} Question 4. Is the sample mean the same each time? Why? Question 5. What is the range of sample mean values? Question 6. What is a quick estimate of their standard deviation? Question 7. How could we improve the sample mean s estimate of the population mean? Sample means Now let s increase n from 5 to 20 for each sample and observe the effect on the sample means. Take 5 random samples each with 20 women from the subgroup and computing the mean to estimate the population mean. x = {114, 115, 117, 114, 115} Question 8. What is the range of sample mean values? Question 9. What is a quick estimate of their standard deviation? Question 10. Does the variation in the x s increase or decrease if we increased the sample size? Question 11. Does the sample mean an estimate of the population mean get better or worse as the sample size increases? 1.2 Sampling distributions Sampling variability. Definition 1.1

4 of 8 1.2 Sampling distributions Definition 1.2 Definition 1.3 Definition 1.4 From sample to sample the value of a statistic will vary due to random fluctuations (sampling error). Sampling distribution. A probability distribution that describes a statistic used to estimate a parameter. Result of sampling variability. Describes how precise and accurate a statistic is for measuring a population parameter. Sampling distribution of the mean. Probability distribution of sample means x of sample size n. Sampling distribution of the proportion. Probability distribution of sample proportions ˆp of sample size n. Comparison of population dist. vs. sample mean dist. Repeatedly sampling 20 women from the subgroup and plotting the distribution of sample means x along with the population density: Population density. vs. sample mean dist. 0.00 0.05 0.10 0.15 population density sample mean dist. for n=20 80 100 120 140 160 blood pressure Definition 1.5 unbiased estimators. If the mean of a sampling distribution for a statistic equals the population parameter it is unbiased. On average, it correctly estimates the parameter. Biased statistics tend to be wrong on average. unbiased statistics mean, variance, proportion biased statistics median, range, standard deviation Good estimators are often unbiased but biased tends to have negative overtones and not all biased statistics are bad. Biased statistics are often used with great effectiveness (eg. standard deviation). Bias is just one measure of how good a statistic is, there are other measures such as consistency, efficiency, and sufficiency. Question 12. Why would we use standard deviation instead of variance if it is biased?

The Central Limit Theorem 5 of 8 Why sample with replacement Sampling with replacement results in independent events, making the probabilities easier to describe and the resulting formulas become much simpler. We will assume sampling with replacement for many cases later in this class. As we have seen, when our sample size is small relative to the population (n/n 0.05), sampling without replacement can be approximated as with replacement. 1.3 Central Limit Theorem: sampling dist. of x Central Limit Theorem (CLT). Definition 1.6 The sampling distribution of x with random sample size n will be normally distributed if 1. the population is normally distributed or 2. the sample size n > 30. 1 and the mean and standard deviation of the sampling distribution will be: µ x = µ (1) standard error: σ x = σ n (2) where µ and σ are the population mean and standard deviation. Very Important! What the CLT tells us If we satisfy the CLT, then x has a normal distribution and a known mean and standard deviation. σ x tells us how precisely we can estimate µ using x. Finally, we can characterize x s sampling error! Increase n to increase the accuracy of the estimate for µ. (increasing n decreases σ x ). Beyond sampling distributions, CLT says that variables determined by complex systems often have a normal distribution (the convolution of a number of density functions tends to the normal). That s one reason why we see it frequently in nature: height, weight,... Question 13. If we repeatedly sample all the possible combinations of 10 women what would the distribution of sample means look like? Case I when CLT applies. 1 Will be approximately normal.

6 of 8 1.3 Central Limit Theorem: sampling dist. of x Since x is normally distributed we see the sample mean x distribution is also normally distributed as predicted by the CLT. Population density. vs. sample mean dist. 0.00 0.04 0.08 population density: mu=100 sd=15 sample mean dist. for n=9 50 100 150 The Empirical rule tells us: 95% of data x : µ x ± 2σ x = 100 ± 2 15 = 100 ± 30 95% of sample means x : µ x ± 2σ x = 100 ± 2 15 9 = 100 ± 10 Case II when CLT applies. If x is not normally distributed (it s a gamma dist.) we see the sample mean x distribution is not normally distributed when n 30, but is approximately normally distributed when n > 30 as predicted by the CLT. Population density. vs. sample mean dist. 0 1 2 3 4 5 population density sample mean dist. for n=5 sample mean dist. for n=40 0.0 0.5 1.0 1.5 2.0

The Central Limit Theorem 7 of 8 Relationship of σ x to n Effect of sample size on standard error σ x 0.2 0.4 0.6 0.8 1.0 σ x = σ n 0 20 40 60 80 100 sample size: n (σ = 1) Increasing n decreases the variation in the sample means σ x. Thus, you can increase the precision of x as an estimate for µ by increasing n. However, each time you double the precision (halve σ x ) you must increase the sample size by fourfold. Tips for solving problems First write down what is given: µ, σ, n,..., then if the question is about: 1. an individual: use µ and σ. 2. the mean or a number of individuals: use µ x and σ x. 1.4 Summary Sampling distribution: describes distribution of a statistic. Smaller sample distribution variation indicates we can more precisely estimate the population parameter. CLT: x normally distributed if x normally distributed or n > 30 then: µ x = µ standard error: σ x = σ n 1.5 Additional Examples Men s hip breadths are normally distributed with a mean of 14.4 in and a standard deviation of 1.0 in.

8 of 8 1.5 Additional Examples Question 14. If one man is randomly selected, find the probability that his hip breadth is between 15 and 17 in. Question 15. If 25 men are randomly selected, find the probability that their mean hip breadth is between 15 and 17 in. Question 16. You need to build a bench that will seat 18 male football players. What is the minimum length of the bench if you want a 0.975 probability that it will fit all 18 men. Question 17. What s wrong with our answer from the previous question?