1 CHAPTER 18 SAMPLING DISTRIBUTION MODELS STAT 203
Outline 2 Sampling Distribution for Proportions Sample Proportions The mean The standard deviation The Distribution Model Assumptions and Conditions Sampling Distribution for a Mean The mean, standard deviation and Model The Central Limit Theorem Assumptions and Conditions The effect of Sample Size
But First, an experiment 3 If you can please bring a coin to class. The experiment: Each student will flip a coin 20 times and count the number of heads. Let us describe the distribution of the sample of proportions of heads. What do we know
Two Proportions 4 In each of our little experiments there are two proportions: The true proportion, p, where 0 p 1: This can be the probability of success (as in flipping a coin). It can also be the proportion of a population which has a desired characteristic (e.g. proportion of BC residents who voted in the last federal election). The sample proportion: For a randomly selected sample, each case is deemed a success or a failure and the sample proportion is given as ˆp = # of Successes Sample Size n
What s Random and What s Fixed 5 The population proportion p is fixed. The sample proportion P^ is used to estimate the population proportion (the parameter). Before moving on to estimation, we consider the situation where we know the population proportion. The idea is to view our sample as a realization of a random sample. Hence, our Sample Statistic is a realization of a random experiment. It s value is one of many possible values it could have taken. The range of possible values the sample statistics can take on, along with the probability of each value is called the Sampling Distribution.
6 Sampling Distribution for Sample Proportions 1 In the coin example, we saw we could determine the list of possible values of the sample statistic and the probabilities for each. Outside the scope of this course is the name of that distribution called: the Binomial Distribution. Using this distribution is quite cumbersome and as it turns out, often not necessary. It turns out that the Normal Model is quite good at approximating the sampling distribution of sample proportion, provided the sample size is large enough.
7 Sampling Distribution for Sample Proportions 2 Knowing that the sampling distribution is Normal is only half the battle we need the distribution parameters and. Recall that to iden.fy which member of the Normal distribu.on we are working with, we need to iden.fy the mean and standard devia.on. It turns out these are and So the sampling distribu.on of is
About Spread 8 In this chapter we pretend we know the true population proportion p and work out probabilities for. In situations where we don t know p, we consequently don t know the standard deviation. In such cases we need to estimate the standard deviation as well. - Standard deviation of. The true spread. - Standard error. An estimate of the standard deviation of the sampling distribution. SE(ˆp) = ˆp(1 ˆp) n
Conditions for using a Normal 9 Approximation The sample is randomly drawn from the population The sampled values must be independent. Individuals are drawn without replacement from the population, so independence can never be achieved. But this assumption is well validated as long as the sample size is no greater than 10% of the population size. The sample size needs to be large. It is sufficient to verify that both np 10 and n(1-p) 10.
Example: Our Experiment 10 In our coin example, what s the probability of flipping a coin 20 times and being off by more than 0.25 from p? What is the probability of being within 0.05 of the true proportion? If we flipped a coin 19 times, what would the probability of getting the sample proportion agreeing with the true proportion?
Defects 11 A company known for its cheap toys is being investigated for security purposes. Of the toys they produce, 21% are defective! A sample 154 is taken and tested. a) Describe the sampling distribution model for the sample proportion by naming the model and telling its mean and standard deviation. Justify your answer. b) What is the probability that in this sample over 20% of the toys will be found to be defective?
Aliens 12 Aliens have come to abduct 100 humans (randomly). The humans will be able to choose between participating in their circus or being put in their galactic zoo. Due to ethics, they can t force a human against there choice. They need to have at least 70 people in the circus. It turns out that 66% of humans prefer the circus. What is the approximate probability that their sample will not meet their demands?
Quantitative Data 13 Proportions are used to summarize categorical variables and the Normal Model is useful for these. Means summarize quantitative variables and the Normal Model can be used for these as well. Once again, we need to determine the parameters of this Normal Model. Here we are looking for the parameters of the Sampling Distribution not the population of interest
Sampling Distribution of the Mean- 1 14 The sampling distribution of means is the distribution of all the possible random samples of size n that could be selected from a population. Suppose a random sample of n subjects is to be drawn from a population, and the observation on a subject (y) in the population follows a distribution with mean µ and standard deviation σ. The mean of the sampling distribution of means is represented by, and is equal to
Sampling Distribution of the Mean- 2 15 The standard deviation of the sampling distribution of means is represented and given by Equivalently, let y 1, y 2,, y n be a random sample from a population with mean µ and standard deviation σ. The set of sample means in repeated random samples of size n from this population has mean µ and standard deviation equal to. When σ is unknown, is estimated by substituting σ by the sample SD s. This is the standard error of :
16 Conditions for using a Normal Approximation The sample is randomly drawn from the population The sampled values must be independent. The sample size is no greater than 10% of the population size. The sample size needs to be large. The rule of thumb is n>30.
The Central Limit Theorem 17 Sit back and think of how powerful what we ve just done is. Regardless of the distribution of the data could even be categorical the distribution of the mean (and proportion) follows an approximately Normal Distribution if certain (simple) conditions are met. That s impressive. The result that allows us to claim this is the Central Limit Theorem (CLT). For large sample size, the sample mean follows an approximately Normal Distribution.
Sample Size 18 There are two advantages to having a large sample size. As sample size increases, the sampling distribution becomes more and more Normal. As the sample size increases the SD and SE both decrease. In other words, our results become more and more precise.
Commercial Aircrafts 20 The ages of U.S. commercial aircraft have a mean of 13.0 years and a standard deviation of 7.9 years (based on data from Aviation Data Services). The Federal Aviation Administration randomly selects 36 commercial aircrafts for special stress tests. (a) Describe the sampling distribution of the mean age of a sample of 36 aircrafts. (b) Find the probability that the mean age of this sample group is greater than 15.0 years. (c) Is the probability calculated in part (b) an exact or an approximate probability? Justify your answer.
Cola 21 A bottling company uses a filling machine to fill plastic bottles with cola. A bottle should contain 300 ml. In fact, the contents vary, with mean 302 ml and standard deviation 3 ml. (a) What is the probability that an individual bottle contains less than 300 ml? (b) What is the probability that the mean contents of bottles in six six-packs is less than 300 ml? (c) What is the probability that one or more bottles are under-filled?
When is the CLT not required? 22 If the distribution of the population you are trying to study is Normal, then the distribution of the sample mean is also Normal. In such cases, the requirement of Sample Size is dropped, but the requirement of independence and randomness are still required. We know that the Birth weights after a regular gestation period follow a Normal Distribution. Using Vital Statistics for the United States, the population mean is found to 6.5 lbs with SD 0.72 lbs. A sample of 15 babies in Holland leads to a sample mean of 6.95 lbs. What is the probability of observing a sample mean as high or higher if the distribution of birth weights in Holland is the same as in the US?