and the Sample Mean Random Sample

MATH 183 Random Samples and the Sample Mean Dr. Neal, WKU Henceforth, we shall assume that we are studying a particular measurement X from a population! for which the mean µ and standard deviation! are unknown. When we have several sub-populations under consideration, then we may denote the parameters by such variables as! 1, µ 1,! 1 and! 2, µ 2,! 2 in order to distinguish between different sub-groups within!. Random Sample Our first goal is to estimate the mean µ and the standard deviation! of the population measurement X. If possible, we first should note the approximate population size N. For extremely large populations, as with a national survey of registered voters, we can assume that we have an infinite population size. Then we must collect a random sample x 1, x 2,..., x n of n measurements. A properly collected random sample will have the following properties: (i) Individual measurements x i are independent; that is, for all i j, the i th response x i is not affected by nor does it affect the j th response x j. (ii) The respondents are representative of the entire population; i.e., the sample is stratified. For example, suppose we want a random sample from WKU's student body of approximately 19,000 students. Because 60% of the student body is female, then 60% of our sample measurements should be from females. Because 15% of students are enrolled in the Community College, then 15% of the responses should be from Community College students. In general, it may be difficult to obtain such a stratified random sample that is truly representative of the entire population with respect to all demographics such as age, race, sex, religion, political persuasion, etc. With modern databases though, it becomes easier to choose a stratified random sample in certain situations. Example 1. (A Sample of Voters) A roll of all registered voters! in a county can be obtained which is broken down by party affiliation. We wish to use the list to approximate the average age of registered voters in the county. Suppose there are N = 8250 registered voters in the county, of which 58% are Democrats! 1, 36% are Republican! 2, and 6% are Other! 3. We decide to take a random sample of n = 850 of the registered voters in the correct proportions and ask their age. How many of each category do we need? Simply sample 850! 0.58 = 493 Democrats, 850! 0.36 = 306 Republicans, and 850! 0.06 = 51 others, and obtain the age of each. (Also note that there are N 1 = 8250! 0.58 = 4785 Democrats, N 2 = 8250! 0.36 = 2970 Republicans, and N 3 = 8250! 0.06 = 495 others in the entire pool of registered voters.)

Assuming the roll of voters is enumerated and broken down by party, here is one way to choose the sample: Choose a random integer from 1 to 4785 and call the person with that number from the Democrat list. Then mark out that person if they respond or ask to be left alone. Repeat the process until 493 persons have been contacted from the Democrat roll. If the same random integer reoccurs, then ignore it and choose another. Then proceed to the Republican roll by choosing random integers from 1 to 2970 until 306 persons are contacted. Then proceed to the combined rolls of the other registered voters. (Find the randint command under MATH then PRB.) Note: If the actual percentage breakdowns of each party are unknown, then we can choose our random integers from 1 to 8250 and sample 850 from among the whole group at once. This completely random sampling still should produce a sample that is close to being stratified along party lines. Repeated Experiments In most cases, we choose a random sample without replacement. Although the actual measurements we obtain may repeat, they come from different respondents. In Example 1, it is in fact a certainty that we will have the same age reoccurring from many different voters because we would be asking the age of 850 people. But we never ask the same person twice, so the repeated responses are still obtained by choosing without replacement. However, when obtaining measurements from experimental processes, we often model the procedure on sampling with replacement. In this case, we may independently perform the same experiment over and over in order to obtain a sequence of outcomes that may be repeated. Example 2. (Rolling One Die) In order to test if a regular six-sided die is loaded, we roll the die over and over a total of 120 times to obtain the average roll. In this case, there are only six possible outcomes Side 1, Side 2,..., or Side 6 of the die. If we knew that the die was fair so that each side was equally likely, then the true average would be µ = (1+2+3+4+5+6)/6 = 3.5. But these six outcomes do not represent a finite population of measurements, as with a population of registered voters. If they did, then the largest sample could be of size 6. Instead we obtain a random sample of size n = 120 (or any desired size) by repeatedly rolling the die. On each roll, any of the six outcomes may occur. In a sense, we simply choose one of the sides {1, 2, 3, 4, 5, 6}. Then we choose one of the sides again with repeats allowed. Thus, in effect, we are sampling with replacement.

How Many Possible Samples Are There? Assume that we have a population of size N. This value could represent the number of outcomes in an experiment such as rolling a die, or it could be the number distinct people in a population under study. I. Choosing a sequence of length n in order with replacement: Since repeats are allowed, there are always N possibilities for each individual choice. Thus: From a population of size N, there are N n possible sequences of length n when repeats are allowed. II. Choosing a sample of size n N without repeats, without regard to order: Now we choose n objects all at once in a group without aligning them in the order of choice. Such a selection is called a combination. From a population of size N, N! there are N ncr n = n!(n! n)! possible combinations of size n, where n N. III. Choosing a stratified random sample: Suppose the population is divided into subpopulations of sizes N 1, N 2,..., N k, where N 1 + N 2 +... + N k = N. We choose proportional samples (combinations) from each sub-population of sizes n 1, n 2,..., n k respectively, where n 1 + n 2 +... + n k = n. (We assume that n i / n = N i / N for all i in order to have a properly representative sample.) There are ( N 1 ncr n 1 )! ( N 2 ncr n 2 )!...! ( N k ncr n k ) ways to choose a stratified random sample. Example 3. In a class of 30 students, there are 16 females and 14 males. How many ways are there to pick a sample of 5 students in the following settings: (a) On 5 consecutive days, a student is chosen at random with repeats allowed. (b) On one day, 5 students are chosen at random all at once. (c) Three females and two males are chosen all at once.

Solution. (a) There are 30 possibilities each day, so there are 30 5 = 24,300,000 ways. 30! (b) Now there are 30 ncr 5 = = 25!! 5! 142,506 ways. This value can computed by entering 30 ncr 5 using the ncr command from the MATH PRB menu. (c) There are (16 ncr 3)! (14 ncr 2) = 50,960 ways to choose 3 females and 2 males all at once. Example 4. How many possible sequences of 120 rolls of a six-sided die are there? Solution. On each roll there always 6 possibilities; thus, there are 6 120 2.3886364! 10 93 possible sequences of length 120. Each time you make a sequence of 120 rolls, you will obtain a different sequence such as 6, 4, 6, 2, 1, 3, 4,...., 5. But any single such sequence can be averaged to approximate the average roll (which should be around 3.5 for a fair die). Example 5. (a) How many ways are there to choose a random sample of size 51 from a population of 495 people? (b) How many ways are there to choose the stratified random sample of Example 1 of 493 Democrats, 306 Republicans, and 51 Others? Solution. (a) Choosing a combination (all at once, without regard to order), there are possible random samples. 495 ncr 51 1.1898! 10 70 (b) There are (4785 ncr 493)! (2970 ncr 306)! (495 ncr 51) (overflow) ways to choose the random sample in Example 1. These values like 2.3886364! 10 93 and 1.1898! 10 70 are literally too large for the mind to comprehend. There is no conceivable way to list all possible samples, and there is virtually no possibility of any two independent random samples turning out the same way. Thus, two different random samples will yield different sample means x. Yet any single sample mean x should be sufficient to estimate the true population average µ.

The Sample Mean After properly obtaining a random sample of n measurements x 1, x 2,..., x n from a population of size N, we then compute the sample mean x by x = x 1 + x 2 +...+ x n n. A sample mean is only an estimate of the true population mean µ. The collection of all possible sample means x has the following properties: µ x = µ The average of all possible sample means is the true population average µ.! x = $ & % & ' &! n! n " N # n N #1 with replacement (or "large" populations ) without replacement (or for Pop. Size N) Regardless of whether we sample with or without replacement, the average of all possible sample means from random samples of size n always equals the true overall population average µ. However the standard deviation! x of all possible sample means depends on whether we sample with or without replacement. But in either case! x is a fraction of the true population standard deviation!. These properties still hold when we choose a stratified random sample, as in Example 1, as long as we pick samples from each segment of the population that are in proportion to the sub-population ratios. If we continually oversample one or more portions of the population, then the average of the sample means will be skewed. When the population size N is very large in comparison to the sample size n, then N! n N!1 1; thus,! x! n. As the sample size n increases, then! x decreases to 0. Therefore as n increases, the values of the sample means will be consistently closer to the true population mean µ. (This is a good Final Exam Question.)

A Typical Distribution of Sample Means µ Small sample size n creates wide deviation in the values of x. µ Larger n creates less deviation in the values of x. Example 6. Each student in a class of 30 is measured in height. The true class average is µ = 68 inches with a standard deviation of! = 3 inches. Various random samples of size n = 5 are taken and the sample means of the heights are recorded. (a) What is the average of all possible sample means x? (b) What is the standard deviation of all possible sample means x? Solution. (a) µ x = µ = 68 inches. (b)! x =! n " N # n N # 1 = 3 5 " 30 # 5 30 # 1 = 3 5 " 25 29 1.24568 in. Example 7. Among all adults, the true average height is µ = 68 inches with a standard deviation of! = 4 inches. Various nationwide random samples of size n = 900 are taken and the sample means of the heights are recorded. (a) What is the average of all possible sample means x? (b) What is the standard deviation of all possible sample means x? Solution. (a) µ x = µ = 68 inches. (b)! x "! n = 4 0.1333 With samples of size n = 900, there is very little 900 deviation in the sample means. Most sample means will be very close to 68 inches.

Practice Exercises 1. A small college has about 1800 students that is roughly 30% lowerclassmen, 60% upperclassmen, and 10% graduate students. A random sample of size 200 students is to be taken. (a) How many of each group should be sampled in order to stratify the sample? (b) How many ways are there to choose the lowerclassmen? 2. A high school of 320 students has 60 freshmen, 78 sophomores, 80 juniors, and 102 seniors. It is 55% female. A random sample of 48 students is to be chosen. (a) How many ways are there to choose 48 at random from the whole group of 320? (b) How many of each class should be chosen in order to stratify by class? (c) How many males and females should be chosen in order to stratify by sex? (d) How many ways are there to choose the stratified sample in Part (c)? 3. Fifty patients are being treated for a condition with a new medication. After three weeks, each is noted for whether there is (i) improvement, (ii) worsening, or (iii) no change. For every fifty such patients, how many possible outcomes are there for this experimental treatment? 4. An entering freshman class has size 901 and a true average ACT score of µ = 21.4 with a standard deviation of 3.2. Random samples of size 60 are to be taken from the population. (a) How many distinct random samples of size 60 are possible? (b) What is the average µ x of all possible sample means? (c) What is the standard deviation! x of all possible sample means? (d) If the random sample of size 60 were to come from all students nationwide who took the test and assuming that µ = 21.4 and! = 3.2 still, then what would be the average and standard deviation of all possible sample means?

Solutions 1. (a) To stratify the sample of size 200, we need 200! 0.30 = 60 lowerclassmen, 200! 0.60 = 120 upperclassmen, and 200! 0.10 = 20 graduate students. (b) There are 1800! 0.30 = 540 lowerclassmen in the college. So there are 540 ncr 60 3.5! 10 80 ways to choose 60 of the 540 lowerclassmen. 2. (a) There are 320 ncr 48 3. 47! 10 57 ways to choose. (b) 48! 60 320 48! 80 320 = 9 Fr 48! 78 320 = 12 Jr 48! 102 320 = 11.7 12 Soph = 15.3 15 Sr (c) Choose 48! 0.55 26 females and 48! 0. 45 22 males. (d) There are 320! 0.55 = 176 females and 144 males altogether. So there are (176 ncr 26)! (144 ncr 22) 4. 3! 10 56 ways to choose the sample in Part (c). 3. For each patient, there are 3 possibilities: better, worse, no change. Thus, for every 50 patients, there are 3 50 7.179! 10 23 possible results. 4. (a) 901 ncr 60 3.095437! 10 94. (b) µ x = µ = 21.4 (c)! x =! n " N # n N #1 = 3. 2 60! 901 " 60 901 " 1 = 3. 2 60! 841 900 0.3993476. (d) µ x = µ = 21.4 and! x! n = 3.2 60 0.413118.