ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Similar documents
Notice that these facts about the mean and standard deviation of X are true no matter what shape the population distribution has

Sampling Distribution Models. Chapter 17

13. Sampling distributions

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Chapter 18. Sampling Distribution Models /51

Chapter 18 Sampling Distribution Models

3/30/2009. Probability Distributions. Binomial distribution. TI-83 Binomial Probability

Chapter 15 Sampling Distribution Models

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

The Central Limit Theorem

Sampling Distribution Models. Central Limit Theorem

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

ACMS Statistics for Life Sciences. Chapter 11: The Normal Distributions

Example. If 4 tickets are drawn with replacement from ,

CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203

ACMS Statistics for Life Sciences. Chapter 9: Introducing Probability

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions

Stat 101: Lecture 12. Summer 2006

Lecture 8 Sampling Theory

CHAPTER. Sampling Distributions. Parameters and statistics. In this chapter we cover...

Sampling Distribution: Week 6

One-sample categorical data: approximate inference

The Central Limit Theorem

Topics for Today. Sampling Distribution. The Central Limit Theorem. Stat203 Page 1 of 28 Fall 2011 Week 5 Lecture 2

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

CS 361: Probability & Statistics

MATH 1150 Chapter 2 Notation and Terminology

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Chapter 18: Sampling Distributions

Sampling Distributions. Introduction to Inference

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Sociology 6Z03 Review II

7.1: What is a Sampling Distribution?!?!

Lecture 8 Continuous Random Variables

Essentials of Statistics and Probability

Lecture 7: Confidence interval and Normal approximation

Theoretical Foundations

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Binomial and Poisson Probability Distributions

Probability and Probability Distributions. Dr. Mohammed Alahmed

Chapter 7 Sampling Distributions

IV. The Normal Distribution

Part 3: Parametric Models

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Statistical Experiment A statistical experiment is any process by which measurements are obtained.

Two-sample Categorical data: Testing

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

THE SAMPLING DISTRIBUTION OF THE MEAN

Senior Math Circles November 19, 2008 Probability II

Descriptive statistics

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

(A) Incorrect! A parameter is a number that describes the population. (C) Incorrect! In a Random Sample, not just a sample.

Introduction to Measurement Physics 114 Eyres

MATH Chapter 21 Notes Two Sample Problems

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Ch18 links / ch18 pdf links Ch18 image t-dist table

Chapter 9: Sampling Distributions

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.

Chapter 4: An Introduction to Probability and Statistics

What is a parameter? What is a statistic? How is one related to the other?

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Unit 4 Probability. Dr Mahmoud Alhussami

Last few slides from last time

Chapter 1: Exploring Data

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

What is a parameter? What is a statistic? How is one related to the other?

Probability. Hosung Sohn

Probability Distributions.

The Central Limit Theorem

Performance of fourth-grade students on an agility test

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Probability and Discrete Distributions

Lab 5 for Math 17: Sampling Distributions and Applications

1 Probability Distributions

STA Module 4 Probability Concepts. Rev.F08 1

Conditional Probability

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

Data Analysis and Statistical Methods Statistics 651

Statistical testing. Samantha Kleinberg. October 20, 2009

Chapter 8: Sampling Distributions. A survey conducted by the U.S. Census Bureau on a continual basis. Sample

University of Jordan Fall 2009/2010 Department of Mathematics

Elementary Statistics

Resistant Measure - A statistic that is not affected very much by extreme observations.

STAT Chapter 3: Probability

Week 11 Sample Means, CLT, Correlation

Sections 5.1 and 5.2

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion

Chapter 18. Sampling Distribution Models. Bin Zou STAT 141 University of Alberta Winter / 10

Chapter 7: Sampling Distributions

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Probability Distributions

MAT Mathematics in Today's World

CHAPTER 5 Probabilistic Features of the Distributions of Certain Sample Statistics

Special distributions

Transcription:

ACMS 20340 Statistics for Life Sciences Chapter 13: Sampling Distributions

Sampling We use information from a sample to infer something about a population. When using random samples and randomized experiments, we cannot rule out the possibility of incorrect inferences. So we ask: How often would this method give a correct answer if we used it a large number of times?

Some Terminology A parameter is a number which describes some aspect of a population. In practice, we don t know the value of a parameter because we cannot directly examine/measure the entire population. A statistic is a number that can be computed from the sample data, without making use of any unknown parameters. In practice we often use statistics to estimate an unknown parameter.

Mnemonic Device Statistics come from Samples. Parameters come from Populations.

An Illustration According to the 2008 Health and Nutrition Examination Survey, the mean weight of the sample of American adult males was x = 191.5 pounds. 191.5 is a statistic. The population: all American adult males over the age of 20. The parameter: the mean weight of all the members of the population.

On Means We will always use µ to represent the mean of a population. This is a fixed parameter that is unknown when we use a sample for inference. We will always write x for the mean of the sample. This is the average of the observations in the sample.

The Key Question If the sample mean x is rarely exactly equal to the population mean µ and can vary from sample to sample, how can we consider it a reasonable estimate of µ?

The Answer... If we take larger and larger samples, the statistic x is guaranteed to get closer and closer to the parameter µ. This fact is known as the Law of Large Numbers.

The Law of Large Numbers 1 Recall: In the long run, the proportion of occurrences of a given outcome gets closer and closer to the probability of that outcome. E.g. the proportion of heads when tossing a fair coin gets closer to 1/2 in the long run. Similarly, in the long run, the average outcome gets close to the population mean.

The Law of Large Numbers 2 Using the basic laws of probability, we can prove the law of large numbers. The Law of Large Numbers applet is useful for illustrating the law.

A Word of Caution Only in the very long run does the sample mean get really close to the population mean, and so in this respect, the Law of Large Numbers is not very practical. However, the success of certain businesses, such as casinos and insurance companies, depends on the Law of Large numbers.

Sampling Distributions 1 The Law of Large Numbers = If we measure enough subjects the statistic x will eventually get close to the parameter µ. What if we can only take samples of a smaller size, say 10?

Sampling Distributions 2 What would happen if we took many samples of 10 subjects from this population? To answer this question: Take a large number of samples of size 10 from the population Calculate the sample mean x for each sample Make a histogram of the values of x Examine the distribution in the histogram (shape, center, spread, outliers, etc.)

By Way of Example... 1 High levels of dimethyl sulfide (DMS) in wine causes the wine to smell bad. Winemakers are thus interested in determining the odor threshold, the lowest concentration of DMS that the human nose can detect. The threshold varies from person to person, so we d like to find the mean threshold µ in the population of all adults. An SRS of size 10 yields the values 28 40 28 33 20 31 29 27 17 21 and thus we have a sample mean x = 27.4.

By Way of Example... 2 It turns out that the DMS odor threshold of adults follows a roughly Normal distribution with µ = 25 mg/l and standard deviation σ = 7 mg/l. By following the procedure outlined before (taking 1,000 SRS s), we produce a histogram that displays the distribution of the values of x from the 1,000 SRS s. This histogram displays the sampling distribution of the statistic x.

By Way of Example... 3

The Official Definition The sampling distribution of a statistic is the distribution of values taken by the statistic over all possible samples of some fixed size from the population. Thus, the histogram on the previous slide actually displays an approximation to the sampling distribution of the statistic x. Important point: The sample mean is a random variable! Since good samples are chosen randomly, statistics such as the sample mean x are random variables. Thus we can describe the behavior of a sample statistic by means of a probability model.

An Important Difference The law of large numbers describes what would happen if we took random samples of increasing size n. A sampling distribution describes what would happen if we took all random samples of a fixed size n.

Examining the Sampling Distribution Shape: It appears to be Normal. Center: The mean of the 1000 x s is 24.95, very close to the population mean µ = 25. Spread: The s.d. of the 1000 x s is 2.217, much smaller than the population s.d. σ = 7.

A General Fact When we choose many SRSs from a population, the sampling distribution of the sample means is centered at the mean of the original population. But the sampling distribution is also less spread out than the distribution of individual observations.

More Precisely Suppose that x is the mean of an SRS of size n drawn from a large population with mean µ and standard deviation σ. Then the sampling distribution of x has mean µ x and standard deviation σ x = σ/ n. Note that µ x = µ. This notation is simply to tell the difference between the two distributions. Because the mean of the sampling distribution of the statistic x, µ x is equal to µ, we say that the statistic x is an unbiased estimator of the parameter µ.

Unbiased Estimators An unbiased estimator is correct on the average over many samples. Just how close the estimator will be to the parameter in most samples is determined by the spread of the sampling distribution. If the individual observations have s.d. σ, then sample means x from samples of size n have s.d. σ/ n. Thus, averages are less variable than individual observations.

For a Normal Population If individual observations have the distribution N(µ, σ), then the sample mean x of an SRS of size n has the distribution N(µ, σ/ n).

Seeing is Believing

Non-Normal Distributions? We know what the values of the mean and standard deviation of x will be, regardless of the population distribution. But what can be known about the shape of the sampling distribution? Population Distribution Sampling Distribution is Normal. is Normal. Population Distribution Sampling Distribution is not Normal. is?????.

Central Limit Theorem Remarkably, as the sample size of a non-normal population increases, the sampling distribution of x changes shape. In fact, the sampling distribution starts to look more like a Normal distribution regardless of what the population distribution looks like. This idea is the Central Limit Theorem.

The Official Definition Draw an SRS of size n from any population with mean µ and standard deviation σ. When n is large, the sampling distribution of the sample mean x is approximately Normal: x is a random variable with distribuition (roughly) N(µ, σ/ n)

So Why Do We Care? The Central Limit Theorem allows us to use Normal probability calculations to answer questions about sample means, even if the population distribution is not Normal.

Central Limit in Action (a) Strongly skewed population distribution. (b) Sampling distribution of x with n = 2. (c) Sampling distribution of x with n = 10. (d) Sampling distribution of x with n = 25.

Warning! The CLT applies to sampling distributions, not the distribution of a sample. Now I m confused. Larger sample size more Normal distribution of a sample. Skewed population will likely have skewed random samples. The CLT only describes the distribution of averages for repeated samples.

Sample Sizes 1 How large does the sample need to be for the sampling distribution of x to be close to Normal? The answer depends on the population distribution. Farther from Normal More observations per sample needed

Sample Sizes 2 General rule of thumb for sample size n: Skewed populations Sample of size 25 is generally enough to obtain a Normal sampling distribution. Extremely skewed populations Sample of size 40 is generally enough to obtain a Normal sampling distribution.

Sample Sizes 3 Angle of big toe deformations in 28 patients. Population likely close to Normal, so sampling distribution should be Normal.

Sample Sizes 4 Servings of fruit per day for 74 adolescent girls. Population likely skewed, but sampling distribution should be Normal due to large sample size.

CLT and Sampling Distributions There are a few helpful facts that come out of the Central Limit Theorem. These are always true, regardless of population distribution. Means of random samples are less variable than individual observations. Means of random samples are more Normal than individual observations.

Sampling Distributions for Probabilities We have seen that sampling distributions are useful for analyzing the means of quantitative variables. But what if we have a categorical variable instead? Fortunately, we can use the sampling distribution of ˆp.

Probability and Categorical Variables Categorical variables can take any of a finite number of possible outcomes. We choose one such outcome and call it a success. All other outcomes are then non-successes or failures. Note: This is an arbitrary choice, not a moral judgment.

Terminology An experiment finds that 6 of 20 birds exposed to an avian flu strain develop flu symptoms. We say the random variable X = the number of birds with flu symptoms. Recall: X is a count of the successes of this categorical variable in a fixed number of observations.

Terminology If the number of observations is labeled as n, then the sample proportion is ˆp = count of successes in sample size of sample = X n Similar to the sample average x, we can find the sampling distribution for ˆp.

Recall: Binomial Distribution As we saw last week, a binomial distribution consists of n observations and constant probability of success p for each observation. Here we will rely heavily on the fact that the binomial distribution (which is discrete) can be approximated by a Normal distribution.

Recall: Normal Approximation to Binomial Distribution Suppose a count X has a binomial distribution with n observations and success probability p. When n is large, the distribution of X is approximately Normal with distribution N(np, np(1 p)) As a rule of thumb, n should be large enough for the count of successes and failures to be at least 10 each.

Sampling Distribution of a Sample Proportion A count of successes has limited use when comparing different studies (as the sample sizes may differ drastically). Instead if we consider the sample proportion ˆp as our preferred sample statistic, this is much more informative. How good is the statistic ˆp as an estimate of the parameter p? Again we ask: What happens with many samples?

The Official Definition Choose an SRS of size n from a large population that has proportion p of successes. Let ˆp be the sample proportion of successes, Then: ˆp = count of successes in the sample n The mean of the sampling distribution is p. The standard deviation of the sampling distribution is p(1 p)/n. As the sample size increases, the sampling distribution of ˆp becomes approximately Normal.

Summary in Picture Form

Warning! Do not use the Normal approximation for the sampling distribution of ˆp when the sample size is small. Also, the population should be much larger than the sample. We ll say, at least 20 times larger, as a rule of thumb. This approximation is least accurate when p is close to 0 or 1. (Our sample would contain only successes or failures unless n is very large.)

Example: Who Gets the Flu? Suppose that we know that 2.5% of all American adults were sick with the flu on a given day of January 2010. The Gallup-Healthways survey interviewed a random sample of 29,483 people and asked them this question. What is the probability that at least 2.3% of such a sample would answer yes in the survey?

Example: Who Gets the Flu? The population proportion is about p = 0.025 and n = 29, 483. So the sample proportion ˆp has mean 0.025 and standard deviation p(1 p) = n (0.025)(0.975) 29, 483 = 0.00091

Example: Who Gets the Flu? We want the probability that ˆp is 0.023 or greater. First we standardize ˆp and call the corresponding statistic z. z = ˆp 0.025 0.00091 Now finish the calculation. P(ˆp 0.023) = P ( ˆp 0.025 0.00091 = P(z 2.20) = 1 0.0139 = 0.9861 ) 0.023 0.025 0.00091

Example: Who Gets the Flu? There is a more than 98% chance that any sample the Gallup-Healthways survey conducts will contain at least 2.3% who say yes.