Oct. 30 Assignment: Read Chapter 19 Try exercises 1, 2, and 4 on p. 424 Exam #2 Results (as percentages) Mean: 71.4 Median: 73.3
Soda attitudes 2015 In a Gallup poll conducted Jul. 8 12, 2015, 1009 adult Americans were asked: Is regular soda or pop something you actively try to include in your diet, something you actively try to avoid, or something you don t think about either way? 61% answered avoid". The fine print (from gallup.com): ± 4% margin of error; sample size=1009
Soda attitudes 2015 Some Notation: (a) p is the true population proportion (b) ˆp is the sample proportion (c) n is the sample size So in this case: ˆp =0.61, n = 1009, and p is unknown.
If the sample proportion is 61%, what does 4% margin of error mean? (A) We believe the true proportion is either 57% or 65%. (B) We believe fewer than 4% of respondents gave inaccurate responses. (C) It is impossible that we have made an error of more than 4% (D) We believe 57% to 65% is a reasonable range for the true proportion.
Review How did we measure and assess the uncertainty in the sample percentage back in chapter 4? margin of error = 1 p sample size For this Gallup poll, the sample size is 1009, so we get: margin of error = 1 p = 1 1009 31.8 =0.032 (and Gallup says margin of sampling error is ±4 percentage points.)
Remember the empirical (68 95 99.7) rule? 68% -3-2 -1 0 1 2 3 95% -3-2 -1 0 1 2 3 99.7% It takes ±2 standard deviations to get 95%. -3-2 -1 0 1 2 3
A new method for margin of error: Based on the 68 95 99.7 rule, since there is something appealing about 95%, we can redefine the margin of error as Margin of error = 2 standard deviations But standard deviations of what??
Which of the following is NOT a fixed constant (i.e., which one is not necessarily the same for every possible sample we draw)? (A) The sample proportion (B) The true population proportion (C) The standard deviation of all possible sample proportions (D) The mean of all possible sample proportions
Based on a sample of 1009 American adults, Gallup estimated that 61% of a population of hundreds of millions try to avoid soda. If they took a different sample of 1009, they would have gotten a new sample percentage. It will not always be exactly 61%. If they took lots of samples of 1009, they would get lots of sample percentages. Let's look at a hypothetical histogram for the percentages.
Histogram of 10,000 Percentages 10,000 percentages based on 10,000 samples of 1009 each. 44 46 48 50 52 54 56 Mean =??? (okay, I cheated. I used 50% for the true population percent. But I had to use something for the "unknown" population percent!) Approx. standard deviation = (53% 47%) / 4 = 1.5% (or.015)
So in our example, if the sample size is 1009, Then old method for MARGIN OF ERROR gives: margin of error = 1 p = 1 1009 Or 3.2 % And we report 59% + 3.2% 31.8 =0.032 But suppose we define the margin of error to be 2 standard deviations. We estimated the standard deviation from the histogram to be.015. This nearly agrees since 2.015 =.03. Pretty close! But creating a hypothetical histogram is a royal pain! Is there an alternative?
Formula for estimating the standard deviation of a sample proportion (without s a histogram): sample proportion (1 sample proportion) sample size Or in our case: r (.61) (.39) standard deviation 1009 =0.015 If we happen to know the true population proportion we use it instead of the sample proportion. (This is unrealistic; do you see why?)
Summary: 1. We take a sample of 1009 landline and cellphone interviews 2. We estimate the percent of American adults who try to avoid soda: 61% 3. To assess the uncertainty in the 61% sample figure, we think of a normal curve of percentages with a standard deviation of.015 = 1.5% 4. Since this normal curve has 95% of its distribution within 2 1.5% of the true value we want to know, we conclude that 61% plus or minus 2 1.5% is a reasonable interval of values for that true value to lie in. Notice: the old M.O.E. formula gives about the same as 2 1.5%
What to expect from sample proportions: An example Facts: (1) Fingerprints may be influenced by prenatal hormones. (2) Most people have more ridges on right hand than left. (3) People who have more on the left hand are said to have leftward asymmetry. (4) Women are more likely to have this trait than men. (5) The proportion of all men who have this trait is about 15%
In a study of 186 heterosexual and 66 homosexual men 26 (14%) heterosexual men showed the trait and 20 (30%) homosexual men showed the trait (Reference: Hall, J. A. Y. and Kimura, D. "Dermatoglyphic Asymmetry and Sexual Orientation in Men", Behavioral Neuroscience, Vol. 108, No. 6, 1203-1206, Dec 94. ) Is it unusual to observe a sample of 66 men and observe a sample proportion of 30%?
We now know what the distribution of sample proportions based on a sample of 66 should look like. We will suppose that the true proportion in the population of men is 15%. Standard deviation r (.15) (1.15) 66 =0.044 Now what? Let s borrow some old ideas and find a z- score for the 30% observed in the experiment: Thus, a sample proportion of 30% would be (.30-.15)/.044 = 3.41 standard deviations above the true mean, assuming that the sample is a representative sample from the population.
Histogram of proportions, with Normal Curve n = 66, true proportion =.15, standard deviation =.044 15 Frequency 10 5 2 std devs 0 0.062 0.15 0.238 0.0 0.1 0.2 0.3 4 standard deviations homosexual men The sample proportion for homosexual men (30%) is too large to come from the expected distribution of sample proportions.
Sample means: measurement variables Suppose we want to estimate the mean weight at PSU Histogram of weight, with normal curve 50 100 150 200 250 Weight Data from stat 100 survey, spring 2004. Sample size 237. Mean value is 152.5 pounds. Standard deviation is about (240 100)/4 = 35
Notation: sample mean = x = 152.5, population mean = µ =unknown What is the uncertainty in the sample mean? We need a margin of error for the sample mean. Suppose we take another sample of 237. What will the mean be? Will it be 152.5 again? Probably not. Consider what would happen if we took 1000 samples, each of size 237, and computed 1000 sample means.
Hypothetical result, using a "population" that resembles our sample: Frequency 100 50 0 145 Histogram of 1000 means with normal curve, based on samples of size 237 150 Weight 155 160 Extremely interesting: The histogram of means is bell-shaped, even though the original population was skewed! Standard deviation is about (157 148)/4 = 9/4 = 2.25
Formula for estimating the standard deviation of the sample mean (don't need histogram) Just like in the case of proportions, we would like to have a simple formula to find the standard deviation of the mean without having to resample a lot of times. Suppose we have the standard deviation of the original sample. Then the standard deviation of the sample mean is: standard deviation of the data sample size
So in our example of weights: The standard deviation of the sample is about 35. Hence by our formula: Standard deviation of the mean is 35 divided by the square root of 237: 35/15.4 = 2.3 (Recall we estimated it to be 2.25) So the margin of error of the sample mean is 2 2.3 = 4.6 Report 152.5 ± 4.6 (or 147.9 to 157.1)
Example: SAT math scores Suppose nationally we know that the SAT math test has a mean of 500 points and a standard deviation of 100 points. Draw by hand a picture of what you expect the distribution of sample means based on samples of size 100 to look like. Sample means have a normal distribution, mean 500, standard deviation 100/10 = 10. So draw a bell shaped curve, centered at 500, with 95% of the bell between 500 20 = 480 and 500 + 20 = 520
Normal curve of SAT means, sample size 100 0.00 0.01 0.02 0.03 0.04 A random sample of 100 SAT math scores with a mean of 540 would be very unusual. A sample of 100 with a mean of 510 would not be unusual. 460 480 500 520 540 Score