and the Sample Mean Random Sample

Similar documents
Lecture 41 Sections Mon, Apr 7, 2008

Now we will define some common sampling plans and discuss their strengths and limitations.

2.6 Tools for Counting sample points

The Chi-Square Distributions

Additional practice with these ideas can be found in the problems for Tintle Section P.1.1

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

ASSOCIATION IN A TWO-WAY TABLE

Chapter 8 Sampling Distributions Defn Defn

STP 226 ELEMENTARY STATISTICS NOTES

Section 7.2 Homework Answers

Formalizing Probability. Choosing the Sample Space. Probability Measures

Event A: at least one tail observed A:

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam.

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

2. AXIOMATIC PROBABILITY

MA : Introductory Probability

Math 138 Summer Section 412- Unit Test 1 Green Form, page 1 of 7

UNIT 5 ~ Probability: What Are the Chances? 1

(i) Given that a student is female, what is the probability of having a GPA of at least 3.0?

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

download instant at

ACMS Statistics for Life Sciences. Chapter 9: Introducing Probability

The Chi-Square Distributions

You may use your calculator and a single page of notes. The room is crowded. Please be careful to look only at your own exam.

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

Senior Math Circles November 19, 2008 Probability II

Chapter 7 Discussion Problem Solutions D1 D2. D3.

3 PROBABILITY TOPICS

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Math 1040 Sample Final Examination. Problem Points Score Total 200

Section F Ratio and proportion

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

FIFTEENTH DAY ENROLLMENT STATISTICS

Module 8 Probability

STAT:5100 (22S:193) Statistical Inference I

Survey on Population Mean

Law of Total Probability and Bayes Rule

= A. Example 2. Let U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, A = {4, 6, 7, 9, 10}, and B = {2, 6, 8, 9}. Draw the sets on a Venn diagram.

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Solve the system of equations. 1) 1) SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

ECON Semester 1 PASS Mock Mid-Semester Exam ANSWERS

AP Statistics Ch 6 Probability: The Study of Randomness

15: CHI SQUARED TESTS

Sampling Distributions

= 1 i. normal approximation to χ 2 df > df

Homework 7. Name: ID# Section

Math 10 - Compilation of Sample Exam Questions + Answers

ST 305: Final Exam ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) ( ) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ Y. σ X. σ n.

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Unit 4 Probability. Dr Mahmoud Alhussami

STAT Chapter 3: Probability

An Introduction to Probability and Statistics

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

Introduction to Statistics

FIFTEENTH DAY ENROLLMENT STATISTICS

Problem #1 #2 #3 #4 Extra Total Points /3 /13 /7 /10 /4 /33

What does a population that is normally distributed look like? = 80 and = 10

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

The empirical ( ) rule

Table of Contents. Enrollment. Introduction

Math 111, Math & Society. Probability

1 The Basic Counting Principles

Introduction to Statistical Data Analysis Lecture 4: Sampling

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Statistical Theory 1

Chapter 6 ESTIMATION OF PARAMETERS

Least-Squares Regression

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Math 140 Introductory Statistics

Who Believes that Astrology is Scientific?

Math 140 Introductory Statistics

MAT2377. Ali Karimnezhad. Version September 9, Ali Karimnezhad

GMAT-Arithmetic-3. Descriptive Statistics and Set theory

Teaching Research Methods: Resources for HE Social Sciences Practitioners. Sampling

NSHE DIVERSITY REPORT

Sampling Distributions: Central Limit Theorem

Discrete Structures Prelim 1 Selected problems from past exams

Chapter 6: SAMPLING DISTRIBUTIONS

Yavapai County Math Contest College Bowl Competition. January 28, 2010

a. Write what the survey would look like (Hint: there should be 2 questions and options to select for an answer!).

IM3 DEC EXAM PREP MATERIAL DEC 2016

WRITING EQUATIONS 4.1.1

( ) P A B : Probability of A given B. Probability that A happens

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Chapter 6 Continuous Probability Distributions

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

Exam 1 Solutions. Problem Points Score Total 145

Conditional Probability Solutions STAT-UB.0103 Statistics for Business Control and Regression Models

Vehicle Freq Rel. Freq Frequency distribution. Statistics

Section 3.2 Measures of Central Tendency

PREPARING FOR THE CLAST MATHEMATICS Ignacio Bello

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

Nicole Dalzell. July 3, 2014

STP 226 ELEMENTARY STATISTICS

Transcription:

MATH 183 Random Samples and the Sample Mean Dr. Neal, WKU Henceforth, we shall assume that we are studying a particular measurement X from a population! for which the mean µ and standard deviation! are unknown. When we have several sub-populations under consideration, then we may denote the parameters by such variables as! 1, µ 1,! 1 and! 2, µ 2,! 2 in order to distinguish between different sub-groups within!. Random Sample Our first goal is to estimate the mean µ and the standard deviation! of the population measurement X. If possible, we first should note the approximate population size N. For extremely large populations, as with a national survey of registered voters, we can assume that we have an infinite population size. Then we must collect a random sample x 1, x 2,..., x n of n measurements. A properly collected random sample will have the following properties: (i) Individual measurements x i are independent; that is, for all i j, the i th response x i is not affected by nor does it affect the j th response x j. (ii) The respondents are representative of the entire population; i.e., the sample is stratified. For example, suppose we want a random sample from WKU's student body of approximately 19,000 students. Because 60% of the student body is female, then 60% of our sample measurements should be from females. Because 15% of students are enrolled in the Community College, then 15% of the responses should be from Community College students. In general, it may be difficult to obtain such a stratified random sample that is truly representative of the entire population with respect to all demographics such as age, race, sex, religion, political persuasion, etc. With modern databases though, it becomes easier to choose a stratified random sample in certain situations. Example 1. (A Sample of Voters) A roll of all registered voters! in a county can be obtained which is broken down by party affiliation. We wish to use the list to approximate the average age of registered voters in the county. Suppose there are N = 8250 registered voters in the county, of which 58% are Democrats! 1, 36% are Republican! 2, and 6% are Other! 3. We decide to take a random sample of n = 850 of the registered voters in the correct proportions and ask their age. How many of each category do we need? Simply sample 850! 0.58 = 493 Democrats, 850! 0.36 = 306 Republicans, and 850! 0.06 = 51 others, and obtain the age of each. (Also note that there are N 1 = 8250! 0.58 = 4785 Democrats, N 2 = 8250! 0.36 = 2970 Republicans, and N 3 = 8250! 0.06 = 495 others in the entire pool of registered voters.)

Assuming the roll of voters is enumerated and broken down by party, here is one way to choose the sample: Choose a random integer from 1 to 4785 and call the person with that number from the Democrat list. Then mark out that person if they respond or ask to be left alone. Repeat the process until 493 persons have been contacted from the Democrat roll. If the same random integer reoccurs, then ignore it and choose another. Then proceed to the Republican roll by choosing random integers from 1 to 2970 until 306 persons are contacted. Then proceed to the combined rolls of the other registered voters. (Find the randint command under MATH then PRB.) Note: If the actual percentage breakdowns of each party are unknown, then we can choose our random integers from 1 to 8250 and sample 850 from among the whole group at once. This completely random sampling still should produce a sample that is close to being stratified along party lines. Repeated Experiments In most cases, we choose a random sample without replacement. Although the actual measurements we obtain may repeat, they come from different respondents. In Example 1, it is in fact a certainty that we will have the same age reoccurring from many different voters because we would be asking the age of 850 people. But we never ask the same person twice, so the repeated responses are still obtained by choosing without replacement. However, when obtaining measurements from experimental processes, we often model the procedure on sampling with replacement. In this case, we may independently perform the same experiment over and over in order to obtain a sequence of outcomes that may be repeated. Example 2. (Rolling One Die) In order to test if a regular six-sided die is loaded, we roll the die over and over a total of 120 times to obtain the average roll. In this case, there are only six possible outcomes Side 1, Side 2,..., or Side 6 of the die. If we knew that the die was fair so that each side was equally likely, then the true average would be µ = (1+2+3+4+5+6)/6 = 3.5. But these six outcomes do not represent a finite population of measurements, as with a population of registered voters. If they did, then the largest sample could be of size 6. Instead we obtain a random sample of size n = 120 (or any desired size) by repeatedly rolling the die. On each roll, any of the six outcomes may occur. In a sense, we simply choose one of the sides {1, 2, 3, 4, 5, 6}. Then we choose one of the sides again with repeats allowed. Thus, in effect, we are sampling with replacement.

How Many Possible Samples Are There? Assume that we have a population of size N. This value could represent the number of outcomes in an experiment such as rolling a die, or it could be the number distinct people in a population under study. I. Choosing a sequence of length n in order with replacement: Since repeats are allowed, there are always N possibilities for each individual choice. Thus: From a population of size N, there are N n possible sequences of length n when repeats are allowed. II. Choosing a sample of size n N without repeats, without regard to order: Now we choose n objects all at once in a group without aligning them in the order of choice. Such a selection is called a combination. From a population of size N, N! there are N ncr n = n!(n! n)! possible combinations of size n, where n N. III. Choosing a stratified random sample: Suppose the population is divided into subpopulations of sizes N 1, N 2,..., N k, where N 1 + N 2 +... + N k = N. We choose proportional samples (combinations) from each sub-population of sizes n 1, n 2,..., n k respectively, where n 1 + n 2 +... + n k = n. (We assume that n i / n = N i / N for all i in order to have a properly representative sample.) There are ( N 1 ncr n 1 )! ( N 2 ncr n 2 )!...! ( N k ncr n k ) ways to choose a stratified random sample. Example 3. In a class of 30 students, there are 16 females and 14 males. How many ways are there to pick a sample of 5 students in the following settings: (a) On 5 consecutive days, a student is chosen at random with repeats allowed. (b) On one day, 5 students are chosen at random all at once. (c) Three females and two males are chosen all at once.

Solution. (a) There are 30 possibilities each day, so there are 30 5 = 24,300,000 ways. 30! (b) Now there are 30 ncr 5 = = 25!! 5! 142,506 ways. This value can computed by entering 30 ncr 5 using the ncr command from the MATH PRB menu. (c) There are (16 ncr 3)! (14 ncr 2) = 50,960 ways to choose 3 females and 2 males all at once. Example 4. How many possible sequences of 120 rolls of a six-sided die are there? Solution. On each roll there always 6 possibilities; thus, there are 6 120 2.3886364! 10 93 possible sequences of length 120. Each time you make a sequence of 120 rolls, you will obtain a different sequence such as 6, 4, 6, 2, 1, 3, 4,...., 5. But any single such sequence can be averaged to approximate the average roll (which should be around 3.5 for a fair die). Example 5. (a) How many ways are there to choose a random sample of size 51 from a population of 495 people? (b) How many ways are there to choose the stratified random sample of Example 1 of 493 Democrats, 306 Republicans, and 51 Others? Solution. (a) Choosing a combination (all at once, without regard to order), there are possible random samples. 495 ncr 51 1.1898! 10 70 (b) There are (4785 ncr 493)! (2970 ncr 306)! (495 ncr 51) (overflow) ways to choose the random sample in Example 1. These values like 2.3886364! 10 93 and 1.1898! 10 70 are literally too large for the mind to comprehend. There is no conceivable way to list all possible samples, and there is virtually no possibility of any two independent random samples turning out the same way. Thus, two different random samples will yield different sample means x. Yet any single sample mean x should be sufficient to estimate the true population average µ.

The Sample Mean After properly obtaining a random sample of n measurements x 1, x 2,..., x n from a population of size N, we then compute the sample mean x by x = x 1 + x 2 +...+ x n n. A sample mean is only an estimate of the true population mean µ. The collection of all possible sample means x has the following properties: µ x = µ The average of all possible sample means is the true population average µ.! x = $ & % & ' &! n! n " N # n N #1 with replacement (or "large" populations ) without replacement (or for Pop. Size N) Regardless of whether we sample with or without replacement, the average of all possible sample means from random samples of size n always equals the true overall population average µ. However the standard deviation! x of all possible sample means depends on whether we sample with or without replacement. But in either case! x is a fraction of the true population standard deviation!. These properties still hold when we choose a stratified random sample, as in Example 1, as long as we pick samples from each segment of the population that are in proportion to the sub-population ratios. If we continually oversample one or more portions of the population, then the average of the sample means will be skewed. When the population size N is very large in comparison to the sample size n, then N! n N!1 1; thus,! x! n. As the sample size n increases, then! x decreases to 0. Therefore as n increases, the values of the sample means will be consistently closer to the true population mean µ. (This is a good Final Exam Question.)

A Typical Distribution of Sample Means µ Small sample size n creates wide deviation in the values of x. µ Larger n creates less deviation in the values of x. Example 6. Each student in a class of 30 is measured in height. The true class average is µ = 68 inches with a standard deviation of! = 3 inches. Various random samples of size n = 5 are taken and the sample means of the heights are recorded. (a) What is the average of all possible sample means x? (b) What is the standard deviation of all possible sample means x? Solution. (a) µ x = µ = 68 inches. (b)! x =! n " N # n N # 1 = 3 5 " 30 # 5 30 # 1 = 3 5 " 25 29 1.24568 in. Example 7. Among all adults, the true average height is µ = 68 inches with a standard deviation of! = 4 inches. Various nationwide random samples of size n = 900 are taken and the sample means of the heights are recorded. (a) What is the average of all possible sample means x? (b) What is the standard deviation of all possible sample means x? Solution. (a) µ x = µ = 68 inches. (b)! x "! n = 4 0.1333 With samples of size n = 900, there is very little 900 deviation in the sample means. Most sample means will be very close to 68 inches.

Practice Exercises 1. A small college has about 1800 students that is roughly 30% lowerclassmen, 60% upperclassmen, and 10% graduate students. A random sample of size 200 students is to be taken. (a) How many of each group should be sampled in order to stratify the sample? (b) How many ways are there to choose the lowerclassmen? 2. A high school of 320 students has 60 freshmen, 78 sophomores, 80 juniors, and 102 seniors. It is 55% female. A random sample of 48 students is to be chosen. (a) How many ways are there to choose 48 at random from the whole group of 320? (b) How many of each class should be chosen in order to stratify by class? (c) How many males and females should be chosen in order to stratify by sex? (d) How many ways are there to choose the stratified sample in Part (c)? 3. Fifty patients are being treated for a condition with a new medication. After three weeks, each is noted for whether there is (i) improvement, (ii) worsening, or (iii) no change. For every fifty such patients, how many possible outcomes are there for this experimental treatment? 4. An entering freshman class has size 901 and a true average ACT score of µ = 21.4 with a standard deviation of 3.2. Random samples of size 60 are to be taken from the population. (a) How many distinct random samples of size 60 are possible? (b) What is the average µ x of all possible sample means? (c) What is the standard deviation! x of all possible sample means? (d) If the random sample of size 60 were to come from all students nationwide who took the test and assuming that µ = 21.4 and! = 3.2 still, then what would be the average and standard deviation of all possible sample means?

Solutions 1. (a) To stratify the sample of size 200, we need 200! 0.30 = 60 lowerclassmen, 200! 0.60 = 120 upperclassmen, and 200! 0.10 = 20 graduate students. (b) There are 1800! 0.30 = 540 lowerclassmen in the college. So there are 540 ncr 60 3.5! 10 80 ways to choose 60 of the 540 lowerclassmen. 2. (a) There are 320 ncr 48 3. 47! 10 57 ways to choose. (b) 48! 60 320 48! 80 320 = 9 Fr 48! 78 320 = 12 Jr 48! 102 320 = 11.7 12 Soph = 15.3 15 Sr (c) Choose 48! 0.55 26 females and 48! 0. 45 22 males. (d) There are 320! 0.55 = 176 females and 144 males altogether. So there are (176 ncr 26)! (144 ncr 22) 4. 3! 10 56 ways to choose the sample in Part (c). 3. For each patient, there are 3 possibilities: better, worse, no change. Thus, for every 50 patients, there are 3 50 7.179! 10 23 possible results. 4. (a) 901 ncr 60 3.095437! 10 94. (b) µ x = µ = 21.4 (c)! x =! n " N # n N #1 = 3. 2 60! 901 " 60 901 " 1 = 3. 2 60! 841 900 0.3993476. (d) µ x = µ = 21.4 and! x! n = 3.2 60 0.413118.