Chapter 8: Sampling Distributions Section 8.1 Distribution of the Sample Mean Frequently, samples are taken from a large population. Example: American Community Survey (ACS) A survey conducted by the U.S. Census Bureau on a continual basis. Sample #3 Sample #1 POPULATION Sample #2 Example: Population = Linda s M146 Spring 2017 students (31 total) Mean age = = Page 1 of 21
Random samples of M146 S17 student ages: Sample 4 student ages, and calculate the mean of the sample: Random Sample #1: n = 4 Random Sample #2: n = 4 x = x = Do you expect either sample be exactly the same as the population mean, the mean age of all students in the class? Do you expect the two sample means to be the same as each other? Because Sample means! A sample provides data for only a of the population, therefore it will not yield perfectly accurate information about the population. Therefore, the sample mean is a. Therefore, the sample mean has its own. In theory, to obtain a sampling distribution of the sample mean: 1. Take a simple random sample of size n. 2. Compute the sample mean. 3. Repeat, until all possible simple random samples of size n have been obtained. Page 2 of 21
Repeat the sampling procedure: Take random samples of four students. Calculate the mean age for each sample. Everyone will get two sets of four random ages from our class population data. Write your two means on a data sheet, without rounding off, in the appropriate category. 4 random student ages from Linda s M146 S17 class. Page 3 of 21 Mean = Mean = (do NOT round off!) Class Results Mean of 4 Ages Frequency 16.0 < 17.0 17.0 < 18.0 18.0 < 19.0 19.0 < 20.0 20.0 < 21.0 21.0 < 22.0 22.0 < 23.0 23.0 < 24.0 24.0 - < 25.0 25.0 - < 26.0 26.0 - < 27.0 27.0 - < 28.0 28.0 - < 29.0 29.0 - < 30.0
Page 4 of 21
The frequency histogram that we just made demonstrates how you can create an experimental sampling distribution of the mean. How does the sampling distribution that we created compare with the original distribution of the age data (i.e. the population)? 1. How do the shapes of the distributions compare? Original population: Sampling distribution: 2. How do the center values (means) of the two distributions compare? Original population: Sampling distribution: Page 5 of 21
Effect of Sample Size on the Sampling Distribution There is a different sampling distribution for every different. Repeat the sampling procedure: Take random samples of ten students. Calculate the mean for each sample. Do NOT round off your mean calculation. Everyone will get two sets of 10 random samples from our class pop. data. Write your two means on a data sheet, without rounding off, in the appropriate category. 10 random student ages from Linda s M146 S17 class. Class Results Mean of 10 Ages Frequency 17.0 < 18.0 18.0 < 19.0 19.0 < 20.0 20.0 < 21.0 21.0 < 22.0 22.0 < 23.0 23.0 < 24.0 24.0 - < 25.0 Mean = Mean = (DO NOT round off!) 25.0 - < 26.0 26.0 - < 27.0 27.0 - < 28.0 28.0 - < 29.0 Page 6 of 21
Summary of Sampling Distribution Experiment: How does the shape of both sampling distributions compare to the shape of the original population distribution that we were sampling from? Where are both sampling distributions approximately centered? What is the difference in the shape of the two distributions, n = 4 vs. n = 10? What is the difference in the variation of the two distributions, n = 4 vs. n = 10? WHY do you think those differences occur? Page 7 of 21
Impact of Sample Size on Sampling Variability: As we just saw, the sample means cluster more closely around the population mean as the sample size. In other words, the larger the sample size, the smaller the tends to be when we are trying to estimate a population mean µ by using a sample mean, x. If the sample is, it is more likely to be closer to the true. Why do we care about all this? Sample means (or proportions) will! There is ALWAYS some in sample statistics. Usually, pollsters or social scientists or experimenters only get chance to sample! The mean of a single sample will not necessarily precisely represent the mean. However, if we can determine the variability of the sampling distribution (the standard deviation), we can estimate how far off the sample statistic may be from the population parameter. Page 8 of 21
The Mean and Standard Deviation of the Sampling Distribution of x From our sampling experiments in the last section, saw that the mean of the means (mean of x ) was approximately the for both sample sizes, AND that those means were approximately equal to the mean. Mean of the Sample Mean: For samples of size n, the mean of the variable x the mean of the variable under consideration (whatever x represents). In other words, for any sample size, the mean of possible sample means equals the population mean. In symbols: Example: M146 Student Ages Population: µ = n = 4: Theoretically, μ x = From our actual results, μ x = n = 10: Theoretically, μ x = From our actual results, μ x = However, there WERE some differences in the distributions: Page 9 of 21
Key difference in the distributions: The standard deviation of x gets as the sample size gets. Standard Deviation of the Sample Mean: For samples of size n, the standard deviation of the sample means (the variable x ) is equal to: where = the standard deviation of the. Example: M146 Student Ages Population: = n = 4: Theoretically, σ x = From our actual results, σ x = n = 10: Theoretically, σ x = From our actual results, σ x = Sample Size and Variability: The value is: x the standard deviation of the (the x values), as opposed to the standard deviation of the individual data values. It is the standard deviation of the original population, because the sample means have less in them than the original data. also called the gets as sample size gets. Page 10 of 21
The Shape of the Sampling Distribution of x One more concept in this section: looking at the difference between a variable which is normally distributed and a variable that is NOT normally distributed. Example: M146 Student Ages, shape of distributions Student ages, population distribution: Sampling distribution of the mean, n = 4: Sampling distribution of the mean, n = 10: Central Limit Theorem (CLT): For a relatively large sample size, the variable x (the sample means) is approximately distributed, regardless of the distribution of the variable under consideration. The approximation (to a curve) becomes better with sample size. Large sample size means: If the variable is normally distributed to begin with, then sample size will provide a normal distribution for the variable x (the sample means). Page 11 of 21
Normal: Reverse J: Uniform: (right-skewed) Page 12 of 21
Summary: Sampling Distribution of the Sample Mean Suppose that a variable x of a population has mean µ and standard deviation. Samples of size n will be taken from the population, and the mean x will be calculated for each sample. Then, for samples of size n: 1. The mean of x equals the population mean, or: 2. The standard deviation of x equals the population standard deviation divided by the square root of the sample size, or: 3. If x (the population variable) is normally distributed, so is x (the sample means), regardless of the. 4. If x is NOT normally distributed, x will be approximately normally distributed IF the sample size is. Using the Central Limit Theorem (CLT): The central limit theorem allows us to: Calculate probabilities or percentages for certain Instead of just for values. Page 13 of 21
Example: Back in Chapter 7, we calculated the probability of selecting a single woman at random and having her height be less than 5 ft. population of women s heights is normally distributed with = 63.8 in, = 2.6 in. Converted the individual value to a z-score, using z = The x Then used Table V to find the area/probability: P(< 5 ft) = New question: If a group of 10 women is randomly selected, find the probability that their mean height is less than 5 feet. Start by describing the sampling distribution of x for sample sizes of 10: Describe the shape Find the mean μ x and standard deviation σ x = Page 14 of 21
Scores for men on the verbal portion of the SAT-I test are normally distributed with a mean of 509 points and a standard deviation of 112 points (based on data from the College Board). a. If one of the men is randomly selected, find the probability that his score is at least 590. b. Describe the sampling distribution of x for samples of size 16. c. If 16 of the men are randomly selected, find the probability that their mean score is at least 590. Is that result unusual? d. There is a 10% probability that the mean score of a random sample of 16 men will exceed what value? Page 15 of 21
Section 8.2 Distribution of the Sample Proportion A proportion is the ratio or percentage of a population that has a specified characteristic. Example: The proportion of males in this class is. The percentage is: Key it s a different TYPE of data! Example: Math 146 student ages data is: Can calculate: vs. Math 146 student tattoos data is: CAN T calculate: CAN calculate: proportion of responses that were yes or no : n = x = sample proportion = pˆ = If I took another (random) sample of same sample proportion of students that have a tattoo? CBC students, would I get the Therefore, the sample proportion pˆ is also a random variable, and has an associated probability distribution. Page 16 of 21
Example: Proportion of Reese s Pieces that are orange According to the manufacturer, the Hershey company, Reese s pieces chocolate candies have 50% orange candies in the product. So in other words: p = 1 p = Sampling Experiment: Everyone will get two samples of 25 Reese s pieces each. Calculate the sample proportion of orange candies in your two samples. Then put two hashmarks on the data tables to indicate your two sample proportions. p = x n = number orange total number = number orange 25 Page 17 of 21
Proportion of orange, p 0.16 0.20 0.24 0.28 0.32 0.36 0.40 0.44 0.48 0.52 0.56 0.60 0.64 0.68 0.72 0.76 0.80 0.84 Frequency Page 18 of 21
Sampling Distribution of Proportions, n = 25 Notice that the shape of the distribution of the sample proportions is approximately. Notice that the mean of the distribution of the sample proportions is approximately equal to. IF we compared the sampling distribution of proportions for different sample sizes, say n = 5 vs. n = 25, you would notice that the standard deviation of the distribution of the sample proportions as the sample size increases, which is exactly what we saw with the distributions of the sample means. Summary of Sampling Distribution of p For a simple random sample of size n with a population proportion p: The shape of the sampling distribution of p is approximately provided that. The mean of the sampling distribution of p is: The standard deviation of the sampling distribution of p is : Another requirement of using this model is that the samples must be independent of each other. When sampling from finite populations, verify the independence assumption by checking that the sample size n is no more than of the population size. Page 19 of 21
Example: According to a 2016 study done by the Gallup organization, the proportion of Americans who rate their life well enough to be considered thriving is 0.554, or 55.4% *. * Source: http://www.gallup.com/poll/194816/americans-life-evaluations-improve-during-obamaera.aspx?g_source=category_wellbeing&g_medium=topic&g_campaign=tiles a. Suppose a random sample of 100 Americans is asked, Do you rate your life well enough that you would consider yourself to be thriving? Is the response to this question qualitative or quantitative? b. Explain why the sample proportion p is a random variable. What is the source of the variability? c. Verify the model requirements, and describe the sampling distribution of p, the proportion of Americans who consider themselves to be thriving. Verify: sample size is less than 5% of population size? Verify: sample size is large, np(1 p) 10 Describe: shape, mean, and standard deviation. d. In a random sample of 100 Americans, what is the probability the proportion who consider themselves to be thriving is greater than 0.60? e. Would it be unusual for a sample of 100 Americans to reveal that 40 or fewer consider themselves to be thriving? Page 20 of 21
Example: According to recent data from the Gallup organization (February 20, 2017), the proportion of Americans who view North Korea unfavorably is 0.86, or 86% *. Note that this was the lowest favorable rating out of 21 countries measured by Gallup. * Source: http://www.gallup.com/poll/204074/north-korea-remains-least-popular-country-amongamericans.aspx?g_source=politics&g_medium=newsfeed&g_campaign=tiles a. Suppose a random sample of 120 Americans is asked whether or not they view North Korea unfavorably. Verify the model requirements, and describe the sampling distribution of p, the proportion of Americans who consider themselves to be thriving. Verify: sample size is less than 5% of population size? Verify: sample size is large, np(1 p) 10 Describe: shape, mean, and standard deviation. b. In a random sample of 120 Americans, what is the probability that the proportion who view North Korea unfavorably is greater than 0.90? c. Would it be unusual for a sample of 120 Americans to reveal that 96 or fewer currently view North Korea unfavorably? Page 21 of 21