Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). (The symbol => means implies ) Other binomial examples: Pr{3 mutants in a sample of 5} Pr{8 people with blood type O in a sample of 20} In all cases Y = number of successes, and Y ~ BIN. We re interested in Pr{Y 120} where Y is IQ of a randomly chosen person. It turns out that IQ is normally distributed, and so Y ~ N(μ,σ). Other normal examples: Pr{Y y} where y is a specific blood oxygen level. Assuming blood oxygen level is normally distributed (actually, it probably isn't). Height - a classic example of a normally distributed random variable. So if Y = height => Y ~ N(μ,σ) Of course, there are many other distributions: Poisson, Uniform, etc, and If, for example, we have something we know has a Poisson distribution (e.g, the number ticks on a dog) we could write Y ~ Poi(μ). (The Poisson distribution has only one parameter, μ.) 2) But this is all review; what's the point here? Often, in statistics, we're not interested in the probability of an individual number, (the examples above). Instead, we are interested in the distribution of the parameter estimates (or sometimes functions of the parameter estimates). This sounds confusing, but here's an example: What is the distribution of the sample mean, y? (We can also ask about the distribution of s 2 or p, but we won't do much with the distributions of these in this course).
Why is this interesting? because if we know the distribution of y, we can ask question like: What is the probability that the average height of men is less than 68 inches? Or better: What is the probability that the average blood pressure with medication is lower than the average blood pressure with a placebo. We can start asking (and answering!) questions that are important in biology, medicine, and elsewhere! So, to summarize, a sampling distribution is a distribution of one of our parameter estimates, such as y, s 2 or p, etc. 3) Another way of looking at it (similar to book): Let s do an experiment many times (for example, we take a sample many times). The we ask: Each time we calculate the sample mean. So we'd have y 1, y 2, y 3... y m. What is the distribution of these sample means? [Your book uses the term meta-experiment ]. But we also want to know: What is the population mean of all these sample means? What are the population variance and population standard deviation of these sample means? (And, for that matter, how do we estimate these using y and s 2?) As it turns out, we don t need to do m number of experiments for this to work. Instead we can use some mathematics and figure it out. The math is similar to that used for the mathematical definition of a population mean (in other words, we use expected values to calculate this). We won t go into the details here. 4) So we need to know the answers to the above questions. Without doing the math, here they are: The (theoretical or population) mean for y is given as follows: μ y = μ The (theoretical or population) standard deviation for y is given by: σ y = σ (n)
See also p. 159 [159] {151} in your text. Are you surprised by the first result? Hopefully not. It makes a certain intuitive sense that if: y estimates μ, then y of a bunch of y 's also estimates μ. The second result is more difficult to understand (it's not quite intuitive). But think about the following: If you use a bunch of y 's, you would expect them to be less spread out than the original data. In other words, you would expect the variance (or standard deviation) for y to be less than the variance (or standard deviation) for y. (The y 's are closer to each other than the y's). And remember, that we could do some math and show that this is actually the case. Here's a graphical representation of what's going on: Which also leads us directly into the next question.
So what about the actual distribution of Y? If Y is normal, Y is normal (again, hopefully no surprise). If Y is not normal, then if n is large enough, Y is still approximately normal! This answer is due to the Central Limit Theorem (CLT). This is one of the reasons that the normal distribution is so important to statistics. (There are some theoretical exceptions to this theorem, but in practice they're usually not that important). In other words: Take almost any distribution at all. Take the average, y. The average, y will be normally distributed if n is large enough. (What is a large enough n? We'll answer that question later when we start looking at the t-distribution.) So, to summarize, what does all this mean? That we can use our normal tables to calculate probabilities associated with Y. For example, we can answer questions like: Pr{ Y < 68 inches} or Pr{ Ȳ 1 < Ȳ 2 } where Ȳ 1 = average blood pressure on medication and Ȳ 2 = average blood pressure on placebo (In practice, it turns out that we'll usually wind up using t-tables (i.e., the t-distribution) since our n will often not be large enough, but more on that next time). Questions like these are much more interesting than the ones we've been asking until now. This will lead us directly into confidence intervals and then (finally!) statistical hypothesis tests. These concepts are really important, so if you didn't follow everything in class, you might want to reread the notes or go through the text: See section 5.1 [5.1] {5.1}. If you have the 2 nd or 3 rd edition, you should also read through 5.3 [5.3], if you have the 4 th edition, read through {5.2}.
5) Enough theory. Let's see how to use sampling distributions. Example 5.7, p. 160 [5.9, p. 159] {5.2.2, p. 152}. We're looking at princess peas. Somehow, we again know that μ = 500 mg, and σ = 120 mg. We also know that Y = seed weight is normally distributed. We take a sample of 4 seeds. What is the (population) mean of Y for our four seeds? What is the population standard deviation? μ y = μ = 500mg σ y = σ n = 120 4 = 60mg Now suppose we want to calculate Pr{ Y > 550 mg}. You proceed the same way as always, except you need to make sure you use the correct value for σ. Use σ for Y, NOT σ for Y. So let's continue with the example: z = y μ y σ y = 550 500 60 = 0.83 6) Standard Error (SE) so we have Pr {Ȳ >550}= Pr {Z >0.83} = 1 0.7967 = 0.2033 Again, note that we use 60, not 120 in the denominator above. If you have the 4 th edition, you might also want to check out example {4.2.4, p. 155}, which explains the effect of sample size. (We skipped over the rest of chapter 5 and are now in chapter 6!) Let's start to worry about the fact that we almost never know μ and σ. So what do we do instead? How can we answer probability questions if we don't know μ and σ. We won't learn the answer to that yet, but it should be obvious that somehow we need to use our estimates ( y and s 2 ) instead of μ and σ. We estimate μ with y, and σ with s. We ve discussed this before. The fact that we have to estimate σ with s will cause use few problems later, but let's not worry about it now.
So how does all this affect our sampling distributions if don t know μ or σ? In particular, what about the standard deviation for Y?? Instead of using σ we use s. In other words, we estimate: n with s n This quantity is also known as the Standard Error ( in this case for y ): SE y = s n So what does the Standard Error tell us? Basically how reliable our y is as an estimate of the mean. If the SE is small, our y is a pretty good estimate; if it s big, our y is not so good an estimate. More on this next time. Important: the Standard error is not always the same; it depends on what you're interested in: If you're interested in y, then it's defined as above. You can also have a standard error for proportions and other quantities. Let's finish this set of notes by looking at the effect of sample size on some of the quantities we've discussed. Example 6.4, p. 185 [6.4, p. 182] {6.3.2, p. 173-174}: 28 lambs were weighted at birth with the following results: y = 5.17 kg. s =.65 kg. SE =.12 kg. (SE =.65/ 28 = 0.12 ; you can figure out the SE's for the rest of the example yourself) Now what would happen if instead of 28 lambs, we had sampled higher numbers of lambs?
Remember: μ and σ are not random variables They don't change, and are obviously not affected by sample size! y and s 2 are random variables They can change depending on what's in your sample, but usually don't change drastically with sample size. SE is also random SE is affected by sample size (it get's smaller, which is easy to see from the equation above). You should make sure you thoroughly understand this example!