Chapter 2: Simple Random Sampling and a Brief Review of Probability

Size: px

Start display at page:

Download "Chapter 2: Simple Random Sampling and a Brief Review of Probability"

Eileen Goodman
6 years ago
Views:

1 Chapter 2: Simple Random Sampling and a Brief Review of Probability Forest Before the Trees Chapters 2-6 primarily investigate survey analysis. We begin with the basic analyses: Those that differ according to the sampling method. Those that differ by making use of extra information contained in covariates (Chapter 3). Along the way we ll consider the differences between the basic sampling methods from a design perspective as well. But before diving into these topics, we should look at the big picture first (i.e. what are these basic forms of sample). As previously discussed, a sampling method can be classified into probability sampling or non-probability sampling. Probability Sampling Simple Random Sampling Stratified Sampling Cluster Sampling Complex Sampling Non-Probability Sampling Convenience Sampling Quota Sampling Snowball Sampling Judgement Sampling Meaningful analysis requires a probability sampling method. Only these will allow us to know the sampling distribution of estimators which in turn will allow us to have a grasp over the sampling variability. Thus the methods we will concentrate on are: 1

2 Simple Random Sampling This is the sampling method we are used to. An underlying assumption of any intro stats course is that a random sample is obtained using a simple random sample. A simple random sample (SRS) is the simplest sampling method from a conceptual and analytical perspective. It is not true that a simple random sample is the easiest to carry out. Definition: A SRS is one in which ever sample of size n has equal probability of being selected. Example: A wildlife biologist wants to estimate the number of trees in a forest affected by a new parasite introduced to the area via global warming. Rather than investigate every tree, he divides the map of the forrest into square meters, randomly selects 100 of the squares and then investigates any tree in that meter (or closest to it in the absence of a tree). Every subset of 100 squares is equally likely to be selected. Within a few days the task is done and information obtained - much more realistic than investigating every tree! Stratified Sampling Stratified samples can have many advantages over a SRS, such as increased precision of estimates and (possibly) easier implementation. Definition: Stratified sampling has two stages. In the first, the population is divided into groups. In the second, a SRS is done in each group. Example: The wildlife biologist thinks there will be a difference in infection rate according to altitude. He first divides the forest into three regions according to altitude and then samples from each of the regions. 2

3 Cluster Sampling Cluster Sampling has the major advantage of easiness, but there is a decrease in the precision of the estimates. Definition: Cluster sampling has two or more stages. In two stage sampling, we first sample groups using an SRS and then sample units from the selected groups. Example: The forest that is being studied has three natural subregions separated by rivers. In order to minimize travel time, he randomly selects one of the regions and then randomly selects squares from that region. Why are these probability sampling methods? Randomness is not the only ingredient required. There needs to be order to the randomness. Specifically, we need to know what units are in the population and what probability each one has to be selected. Thus a brief review of probability is in order. Probability Statistical results are entirely built upon the results of probability. Probability of Events Sample Space: is the set of all possible outcomes for a given process, and we ll denoted by Ω. Event: A subset of outcomes from the sample space (Ω) that we are interested in, denoted by capital letters. Events either occur or do not occur Population or Universe: The collection of all units we wish to study which we ll denote by U. 3

4 Axioms and Properties of Probabilities Let P(E) denote the probability that event E occurs. quantity which needs to follow certain rules. Here P (E) is a Kolmogorov Axioms of Probability: 1. The probability of an event is a non-negative real number: P (E) The probability that an elementary event from the sample set will occur is 1: P (Ω) = If E 1, E 2,..., E N is a finite sequence of pairwise disjoint events, then P (E 1 E 2... E N ) = N P (E i) These axioms ensure that probability assignments to events are consistent with our notion of probability, but more thought is required to properly assign the probability of an event. Many results follow from the above axioms. If we add to them the definitions of conditional distributions and independence, we get the following useful results Addition Rule: P(A B) = P(A) + P(B) P(A B). If A and B are disjoint, then A B = and P(A B) = P(A) + P(B). Multiplication Rule: P (A B) = P (B A)P (A) and if A and B are independent P (A B) = P (B)P (A) P(A c ) = 1 P(A), where A c denotes the compliment of event A. If A B, then P(A) P(B) P(A B) = P(B A) and P(A B) = P(B A) Note: If P(A B) = 0, then A and B are disjoint 4

5 The following theorem highlights the importance of counting and is used extensively in sampling. Theorem: Let Ω = {w 1, w 2,..., w N } with N equally likely elementary outcomes, where N is finite. Let E be any event in S. Then number of w s in E P (E) = N To help us with counting we will use the binomial coefficient. Suppose we have n distinct objects and we would like to choose k of them. If we can only choose each object once (no replacement) and if order does not matter [choosing the objects (a, b) is the same as choosing (b, a)], then how many distinct sets of k objects can we select from the n of them? (eg) Select k=6 balls from n=49 OR select 100 square maters from 102 million sqare meters of forest. How many ways can we choose k objects from n distinct objects? ( n k ) = n! (n k)!k!, where k! = k*(k-1)*(k-2)*...*2*1 Exercise: What is the probability of choosing the winning Lotto 6/49 number with a single ticket? Birthday Problem: Suppose there are n people in a room. What is the probability that at least 2 will have the same birthday? One of the keys to probability sampling is that each sample has a known probability of being selected. As such we can find the probability of each unit being selected. In some cases, it is easier to determine the probability of each unit being selected and using this to determine the probability of obtaining a specific sample. We use the following notation: π i = P (i th unit is in the sample) Examples: Consider this class as a population. If we run an SRS without replacement of 10 units, what is the probability that you will be selected? 5

6 If we choose 5 individuals of each gender, what is the probability that you will be selected? What type of sampling method is this? Example: Stats Canada is running a survey of hospital patients. They run a multistage sampling method in which they randomly select 5 provinces. Within these provinces they select 2 cities and two rural areas. Within these they select 3 hospitals or clinics and then randomly select 2 departments. Finally they select 5 patients from these. Alice is in the Vancouver General recovering in the ICU. How can we calculate the probability of her being selected? In the rare event of an odd sampling method which is not as systematic as these for which the list of possible samples is available, simply count the number of samples containing the i th unit and divide by the number of possible samples. Random Variables Probability of events are of limited use to us, but Random Variables are based on these and are themselves the basis for all inferential methods we will consider. Recall that a Random Variable has an expectation and a Variance For a discrete random variable, the Expected Value of X is... E(X) = x xf(x) = x xp (X = x) Variance of X = E( (X µ x ) 2 ) = σ 2 x = x R (x µ x) 2 f(x) Example: A population has the following for elements U ={1, 4, 6, 9}. What is the expected value and variance of the sample average for samples of size two? 6

7 Definitions and Properties For the function g, E[g(X)] = x g(x)p (X = x) If a and b are constants, then E[aX + b] = ae[x] + b If X and Y are independent, then E[XY ] = E[X]E[Y ] V ar(x + Y ) = V ar(x) + V ar(y ) + 2Cov(X, Y ) Question: Is E[X/Y ] = E[X]/E[Y ]? Sampling Distributions Given a population of size N, we use the following notation/equations for the population characteristics. t = N y i = Population Total ρ = 1 N ȳ U = 1 N N y i = Population Mean N y i = Population Proportion and y is dichotomous (0, 1) S 2 = σ 2 y = 1 N 1 N (y i ȳ U ) 2 = Variance of Population values For the special case of a dichotomous RV, σ 2 y = ρ(1 ρ) For each of these population characteristics, we have sample analogs. Suppose we draw a sample of size n from U = y 1, y 2,..., y N where N > n. Here y i is the value of Y (the random variable) for the i th element in the sample S 7

8 p = 1 n t s = ȳ = 1 n n y i = Sample Total n y i = Sample Mean n y i = Sample Proportion and y is dichotomous (0, 1) s 2 = 1 n 1 n (y i ȳ) 2 = Sample Variance For the special case of a dichotomous RV, σ 2 y = ρ(1 ρ) Note: Sample quantities exhibit variability. They are examples of summary statistics; their distributions are called sampling distributions. The way in which we obtain the sample will dictate the sampling distribution (more on this later). We don t know the population characteristics - if we did, we wouldn t sample. Let s use these definitions to motivate estimation techniques. Example: Suppose we have 200 bags of mail addressed to Kris Kringle. We want to know more, so we sample 20 bags. The number of letters in each of the sampled bags: {655, 721, 687, 547, 632, 611, 589, 651, 432, 752, 671, 619, 633, 631, 711, 712, 598, 705, 606, 669} 8

9 Review of Estimators With any point estimate, we d like to know it s properties, otherwise there would be many choices of estimates. Estimator: A function of random variables used to estimate a parameter. It is itself random. Estimate: The realization of an estimator. It is a fixed numerical value. Thus, we cannot say anything about the estimate itself, we can only qualify the method used to obtain it - the estimator. It makes no sense to speak of the variability of an estimate. Unbiased: An estimator is unbiased if its expected value is equal to the characteristic it is trying to estimate. Consistent: An estimator is called consistent if its variance converges to 0 as n tends to MVUE: An estimator is the minimum variance unbiased estimator if it has the smallest variance amongst unbiased estimators. Knowing whether the estimator has these qualities is only half the battle. By itself an estimator is of little value. The standard error allows us to understand how wrong we re likely to be. We call the square-root of the variance of an estimator the Standard Error. Much of this course deals with finding the appropriate standard errors for the various estimators we encounter. Naturally, an estimator is biased if its average value differs with the true parameter: Bias[ˆθ] = E[ˆθ] - θ. The question is: Is unbiased always better? There can be occasions where an biased estimator tends to fall closer to the parameter on average than does the unbiased estimator. The mean squared 9

10 error captures this notion: MSE(ˆθ) = E[(ˆθ θ) 2 ] = V ar(ˆθ) + [Bias(ˆθ)] 2 Unfortunately, we typically can t measure an estimator s bias, so we favor the MVUE. Simple Random Sampling As previously defined, a simple random sample is a sample where every possible sample of size n has equal probability of being sampled. Nevertheless, there are two forms of SRS: with or without replacement. A simple random sample with replacement allows every unit in the population to appear at most once in each sample. Comparatively, a SRSWR allows any unit in the population to appear as many as n times in any sample. To run a sample without replacement. First select a unit randomly such that every unit has equal probability of being selected. Now select the second unit such that all remaining units have equal probability of being selected and so forth. To run a SRSWR, select a unit with probability 1 N, replace and then select anew with each unit having equal probability of being selected. Remarks: It s pretty intuitive that selecting the same unit twice, thrice or more is of little informative use. Sampling without replacement is more efficient and it is the method we will use through out the course. The example below will serve as further motivation to this end. The inclusion probability of each unit is equal, but they differ for both sampling methods. Is having an equal probability of inclusion enough to conclude that the sample is an SRS? 10

11 Since SRS without replacement is better, we ll only consider this type of SRS after this chapter. While SRS is simple conceptually and mathematically, it makes a strong assumption: all units are independent. It can lead to overconfidence. There are many situations where this is not true. For example sampling all individuals in a house hold on political affiliations. If there is structure to the population, there may be a better of sampling which exploits and/or takes into account this structure. Thus, SRS should be used when there is no natural structure in the data or there is very little information available on the population. Example (back to sampling distributions): We ll use a very trivial example to highlight some of the definitions we ve discussed and to lead to further discussions. Consider the following population: U = {y 1 = kitten, y 2 = kitten, y 3 = puppy, y 4 = puppy }. Find the proportion of kittens and use the sampling proportion of the sample proportion to find E[ˆp], V ar[ˆp] and M SE[ˆp]. We ll consider three sampling methods: SRS, SRSWR and Systematic Sampling. 11

12 Equations and Notation Population Characteristics: t = N y i ȳ U = 1 N N y i, here p is a special case (for dichotomous variable) S 2 = 1 N N 1 (y i ȳ U ) 2 Characteristic Estimators and Estimators of their Variance: ȳ = 1 n n y i V ar(ȳ) = ( 1 n N ) S 2 n s 2 = 1 n n 1 (y i ȳ) 2 ˆ V ar(ȳ) = ( ) 1 n s 2 N n ( SE(ȳ) = Vˆar(ȳ) = 1 n ˆ CV (ȳ) = SE(ȳ) ȳ ˆt = Nȳ V ar(ˆt) = N 2 V ar(ȳ) = N (1 2 n N ) Vˆar(ˆt) = N (1 2 n s 2 N n V ar(ˆp) = N n ˆ V ar(ȳ) = p(1 p) N 1 n ( 1 n N ) ˆp(1 ˆp) n 1 N ) s 2 n ) S 2 n 12

13 Dealing with Finite Populations When sampling without replacement, no adjustments are required. We can continue to use the estimation methodology that we are already familiar with. However, when dealing with a SRS without replacement, we need to adjust the variance accordingly. Finite Population Correction Term The finite population correction term (fpc) is used to lower the variance in accordance with how much of the population is found in the sample. For a formal derivation of the fpc see Lohr page 45. Recall that the estimators are all summary statistics and as such follow a sampling distribution. We can intuitively understand that the larger the fraction of the population captured in the sample, the more informative it will be and the smaller the variance will be. Comparing the standard error for various sample sizes, the extreme cases help understand this adjustment intuitively. If we only rely on the sample size to get the variance to reach zero, we ll need to n to tend to. With the fpc, the variance will reach 0 at n = N. Typically, the sample size is much smaller than the population size. The sample size should not be chosen based on the population size. A sample of size 100 is as effective for getting information on a population of size 10,000 or 1,000,000. A rule of thumb: the fpc can be ignored if less than 5% of the population is in the sample Having explored the concepts of the fpc and the equations for SRS sampling, we can make the two following statements: The estimator ȳ is unbiased for the population mean in an SRS. 13

14 The estimator Vˆar(ȳ) = S2 (1 n/n) n Confidence Intervals Confidence Intervals are becoming more and more prominent in statistical analysis (why would we favor a confidence interval over a hypothesis test?). We will use these extensively in this course. The construction of a confidence interval is still the same as usual: ˆθ ± z α/2 SE(θ). In order for this to be valid, we ll need to invoke the Central Limit Theorem. The interpretation of a confidence interval becomes slightly more complicated in the context of a finite population. In order to keep the current interpretation, we ll have to imagine that the population being studied is part of a larger collection of populations - called a superpopulation. Due to the fpc we must make adjustments to the sample size claculation equations: n = z2 α/2 S2 ME 2 + z2 α/2 S2 N Example: Over the last week, 1000 people have visited St.Paul s Hospital due to ailment. We sample 100 from a list of phone numbers using a SRS and ask: How long did you wait before being attended to by a doctor? Miraculously, everyone responds leading to the following summary statistics: ȳ = 2.8 hours and s = 0.5 hours. Construct a confidence interval for ȳ and ˆt. How large a sample size would be required for a second study if we want the margin of error for total to be at most 50? 14

15 Systematic Sampling In this sampling scheme we choose every k th element of the frame with a random start on one of the first k units. Here k = N n. Is this SRS? It can be used as a substitute for SRS if the sampling frame is randomized or if there is no sampling frame. Example: A video store clerk wants to know how many videos customers rent on average every month. She samples every 15 th customer starting with the 5 th. Care should be taken when using a systematic sampling scheme. The nature of the list can either mean that systematic sampling will be worse, equivalent to or (in trivial cases) better than an SRS. 15

You are allowed 3? sheets of notes and a calculator.

Exam 1 is Wed Sept You are allowed 3? sheets of notes and a calculator The exam covers survey sampling umbers refer to types of problems on exam A population is the entire set of (potential) measurements