Sections 5.1 and 5.2 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 1 / 19
Sampling variability A random sample is exactly that: random. You can collect a sample of n observations and compute the mean Ȳ. Before you do it, Ȳ is random. If you randomly sample a population two different times, taking, e.g. n = 5 each time, the two sample means Ȳ 1 and Ȳ 2 will be different. Example: sampling n = 5 ages from STAT 205. Variability among random samples is called sampling variability. Variability is assessed through a hypothetical mind experiment called a meta-study. 2 / 19
Study and meta-study 3 / 19
Example 5.1.1 Rat blood pressure Study is measuring change in blood pressure in n = 10 rats after giving them a drug, and computing a mean change Ȳ from Y 1,..., Y 10. Meta study (which takes place in our mind) is simply repeating this study over and over again on different samples of n = 10 rats and computing a mean each time Ȳ 1, Ȳ 2, Ȳ 3,... Because the sample is random each time, the means will be different. A (hypothetical) histogram of the Ȳ1, Ȳ2, Ȳ3,... would give the sampling distribution of Ȳ, and smoothed version would give the density of Ȳ. Restated: the sample mean from one randomly drawn sample of size n = 10 has a distribution. 4 / 19
The density of Ȳ Ȳ estimates µ Y = E(Y i ) (E(Y i ) is the expected value of Y), the mean of all the observations in the population. We ll first look at a picture of where the sampling distribution of Ȳ comes from. Then we ll discuss a Theorem that tells us about the mean µȳ, standard deviation σȳ, and shape of the density for Ȳ. 5 / 19
Sampling distribution of Ȳ Meta-experiment... 6 / 19
Sampling distribution of Ȳ 7 / 19
Sampling distribution of Ȳ from normal data If data Y 1, Y 2,..., Y n are normal, then Ȳ is also normal, centered at the same place as the data, but with smaller spread. (a) population distribution of normal data Y 1,..., Y n, and (b) sampling distribution of Ȳ. 8 / 19
In-class exercise You are conducting a study to investigate the probability of getting a head when flipping a quarter. Flip a quarter 5 times, and record the number of heads. Flip a quarter 50 times, and record the number of heads. Report the number of heads in both studies. 9 / 19
Plotting the sampling distribution of means Y i = Pr{head} = 1 + 1 + 0 + 0 + 0 = 0.4 5
Mean and standard deviation when toss a coin 1 time Let Y be binomial with n trials and probability p. µ Y = n p σ Y = n p (1 p) Example: Flip a coin 1 time and let Y be head (Y=1) or tail (Y=0). The probability of getting a head is Pr{Y = 1} = p µ Y = p σ Y = pq When p=0.5, µ Y = 0.5 and σ Y = 0.5 0.5 = 0.5 The standard deviation for sampling distribution of means with sample size 5: 0.5 5 = 0.224 sample size 50: 0.5 50 = 0.071 11 / 19
Example 5.2.2 Seed weights 12 / 19
Example 5.2.2 Seed weights The population of weights of the princess bean is normal with µ = 500 mg and σ = 120 mg. We intend to take a sample of n = 4 seeds and compute the (random!) sample mean Ȳ. E(Ȳ ) = µ Ȳ = µ = 500 mg. On average, the sample mean gets it right. σȳ = σ n = 120 4 = 60 mg. 68% of the time, Ȳ will be within 60 mg of µ = 500 mg by the empirical rule. 13 / 19
Sampling distribution for Ȳ for Example 5.2.2 µȳ = 500 mg and σȳ = 60 mg. 14 / 19
Pr{Ȳ > 550} for n = 4 Recall for n = 4 that µȳ = 500 mg and σȳ = 60 mg. > 1-pnorm(550,500,60) [1] 0.2023284 Note: the probability of Y is exactly equal to some value is always zero. Because the area under the curve at a single point has no width. This is true for all continuous distributions (but not for discrete distributions). Pr{Y = y} = 0 15 / 19
What happens when n is increased? As n gets bigger, σȳ = σ n gets smaller. The density of Ȳ gets more focused around µ. If Y 1,..., Y n come from a normal density, then so does Ȳ, regardless of the sample size. Even if Y 1,..., Y n do not come from a normal density, the Central Limit Theorem guarantees that the density of Ȳ will look more and more like a normal distribution as n gets bigger. 16 / 19
Sampling dist n for Ȳ from different sample sizes n 17 / 19
Remark Sections 5.1 & 5.2 Sampling distribution for Ȳ In a typical biological application, the population size might be 10 6 ; a sample of size n=100 would be a small fraction of the population but would be large enough for the Central Limit Theorem to be applicable in most situation. 18 / 19
15 Min Bonus quiz next lecture Bonus quiz next lecture. Study textbook Section 4.3, 4.4, 5.1, and 5.2. 19 / 19