Chapter 9: Sampling Distributions 9.1: Sampling Distributions IDEA: How often would a given method of sampling give a correct answer if it was repeated many times? That is, if you took repeated samples (MANY repeated samples), how often would the sample reflect the true distribution of the population you are sampling from? This is the basis of statistical inference which we will study in future chapters. The purpose of this chapter is to prepare us to answer those questions. Parameter: a that describes a. A parameter has (and only one) value we just don't know what it is. The most important parameters for us are the (μ) and population (p or π). Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters. Sampling Distribution: the of a is the distribution of the values taken by the in all samples of the same size from the same population. We can view our as a. That is, we have NO WAY of predicting what value we will get from a. EXAMPLE: (p.570/#5) Sampling Test Scores, I Let us illustrate the idea of a sampling distribution of in the case of a very small sample from a very small population. The population is the scores of 10 students on an eam: Student # 0 1 2 3 4 5 6 7 8 9 Score 82 62 80 58 72 73 65 66 74 62 The parameter of interest is the mean score in this population, which is 69.4. The sample is a SRS drawn from the population. Because the students are labeled 0 to 9, a single random digit from Table B chooses 1 student from the sample. P S Population Parameter Sample Statistic
(a) Use Table B to draw an SRS of size n4 from this population. Write the 4 scores in your sample and calculate the mean of the sample scores. This is an of the. Sample: ACT Scores: sample mean (b) Repeat this process 10 times. That is, you will take 10 more samples of size 4 and compute each sample s mean. Make a histogram of the 10 values of. You are constructing the sampling distribution of. Is the center close to 69.4? 1 2 3 4 5 6 7 8 9 10 (c) Ten repetitions give a very crude approimation of the sampling distribution. Now pool your data with that of other students. Use your calculator s list functions to organize and sort the data, and to construct a new histogram of the sample means. Copy your histogram below. Describe the shape of the distribution. Is the center close to 69.4? Is this histogram a better approimation of the sampling distribution?
Describe the histogram you just drew. Shape: Center: Spread: Unusual Values: Note: We have no way of knowing whether or not OUR STATISTIC is to the parameter we are trying to. We must be aware of and. Unbiased Statistic/Unbiased Estimator: A statistic used to estimate a parameter is if the of its sampling distribution is to the of the parameter being estimated. The statistic is called an of the parameter. Variability of a Statistic: the variability of a statistic is described by the of the sampling distribution. The spread is determined by the sampling and the of the sample. Larger samples give spread. If the population is much larger than the sample (at least 10 times as large), the spread of the is approimately the same for any size. So, the of a sampling distribution depends ONLY on and NOT on the size of the. This means that if a survey of, say, 1,200 people will have the same VARIABILITY (or margin of error) whether the population being sampled is the city of Fullerton or the entire United States. Although not always intuitive, this concept will be shown throughout this and future chapters. Suppose you wanted to estimate the distribution of colors of regular M&M s. You decide to take a sample of M&M s. As long as the M&M s are well mied, the sample doesn t know whether it is coming from a single serving size bag of M&M s, a Costco size bag of M&M s, or a large bucket of M&M s! If any one sample taken is a SRS, the variability of the result depends only on the size of the sample.
OUR GOAL: we want to have AND. Bias, Variability Bias, Variability Bias, Variability Bias, Variability 9.2 Sample Proportions The objective of some statistical applications is to reach a conclusion about a population proportion, p. For eample, we may try to estimate an approval rating through a survey, or test a claim about the proportion of defective light bulbs in a shipment based on a random sample. Since p is unknown to us, we must base our conclusion on a sample proportion, pˆ. However, we know that the value of pˆ will vary from sample to sample. The amount of variability will depend on the size of our sample. Our estimator is the proportion of success: count of "successes"in sample p ˆ size of sample X n Note: the values of X and pˆ will vary in repeated samples, both X and pˆ are. Something to think about Proportions are just another way of looking at counts. For eample, I can talk about how many male students I have in this class, or I can talk about the proportion of males in the class. These are two
different ways of looking at the same information. So don t be too surprised if we find that much of what we learn about is based on what we already know about. Sampling Distribution of a Sample Proportion: Choose an SRS of size n from a large population with population proportion p having some characteristic of interest. Let pˆ be the proportion of the sample having the characteristic. Then: The of the sampling distribution of pˆ is p. The of the sampling distribution of pˆ is p(1 p) n RULE OF THUMB #1: Use the recipe for standard deviation of pˆ ONLY when the population is at least as large as the sample; that is, when. Where is the size of the and is the size of the. Note: we will use this rule throughout the rest of the year whenever our interest is drawing a sample to make inferences about a population. We are interested in sampling only when the population is large enough to make taking a census impractical. RULE OF THUMB #2: Use the Normal approimation to the sampling distribution of pˆ for values of n and p that satisfy and. EXAMPLE: Based on Census data, we know 11% of US adults are Black. Therefore, p 0.11. We would epect a sample to contain roughly 11% Black representation. Suppose a sample of 1500 adults contains 138 Black individuals. Should we suspect undercoverage in the sampling method? 138 Note, ˆp 1500 Is this lower than what would be epected by chance? That is, we know it is possible that a sample could contain 9.2% Black representation but is it likely that would happen due to natural variation in a random sampling method? Check assumptions: Rule of thumb #1: Rule of thumb #2:
Find the mean and standard deviation: Calculate the probability: Interpret in contet: 9.3 Sample Means When the objective of a statistical application is to reach a conclusion about a population mean, µ, we must consider a sample mean,. However, as we have noted, we know that the value of will vary from sample to sample. The amount of variability will depend on n, the size of our sample. Mean and Standard Deviation of a Sample Mean: Suppose that is the mean of a SRS of size n drawn from a large population with a mean deviation, then: and standard The of the sampling distribution of is: The of the sampling distribution of is: EXAMPLE: ACT Scores The scores of individual students on the American College Testing (ACT) composite college entrance eamination have a Normal distribution with mean 18.6 and standard deviation 5.9. (a) What is the probability that a single student randomly chosen from all those taking the test scores 21 or higher?
(b) Now take a SRS of 50 students who took the test. What are the mean and standard deviation of the average (sample mean) score of the 50 students? (c) Do your results depend on the fact that individual scores have a Normal distribution? (d) What is the probability that the mean score of the students is 21 or higher? CENTRAL LIMIT THEOREM: Draw a SRS of size n from population whatsoever with mean and finite standard deviation. When n is large, the of the sample mean is close to the distribution N, n with mean and standard deviation n. Note: the CLT discusses the (and only the shape) of the sampling distribution of when n is sufficiently large. If n is not large, the shape of the distribution more closely resembles the shape of the original population. Thus, there are three situations to consider when discussing the shape of the sampling distribution: Shape of Population Shape of Sampling Distribution