Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014

Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 2 / 31

Simple random sampling Simple Random Sampling Motivation: Choose n units from N units without replacement. 1 Each subset of n distinct units is equally likely to be selected. 2 There are ( N n) samples of size n from N. 3 Give equal probability of selection to each subset with n units. Definition Sampling design for SRS: / (N ) 1 n if A = n P(A) = 0 otherwise. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 3 / 31

Simple random sampling Lemma Under SRS, the inclusion probabilities are π i = n/n π ij = n (n 1) N (N 1) for i j. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 4 / 31

Simple random sampling Theorem Under SRS design, the HT estimator Ŷ HT = N n y i = Nȳ i A is unbiased for Y and has variance of the form where V S 2 = 1 1 1 2 N N 1 (ŶHT ) N N i=1 j=1 = N2 n ( 1 n ) S 2 N (y i y j ) 2 = 1 N 1 N ( yi Ȳ ) 2. i=1 Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 5 / 31

Simple random sampling Theorem (Cont d) Also, the SYG variance estimator is where Thus, under SRS ˆV (ŶHT ) s 2 = 1 n 1 = N2 n ( 1 n ) s 2 N (y i ȳ) 2. i A E(s 2 ) = S 2. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 6 / 31

Simple random sampling Remark (under SRS) 1 n/n is often called the finite population correction (FPC) term. The FPC term can be ignored (FPC. = 1) if the sampling rate n/n is small ( 0.05) or for conservative inference. For n = 1, the variance of the sample mean is 1 ( 1 n ) S 2 = 1 n N N N ( yi Ȳ ) 2 σ 2 Y i=1 Central limit theorem: under some conditions, ˆV ( ) 1/2 Ŷ HT Y = 1 n ȳ Ȳ ( ) N (0, 1). 1 n N S 2 Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 7 / 31

Simple random sampling Remark (under SRS) Sample size determination 1 Choose the target variance V of V (ȳ). 2 Choose n the smallest integer satisfying 1 ( 1 n ) S 2 V. n N For dichotomous y (taking 0 or 1), may use S 2. = P(1 P) 1/4. A simple rule is n d 2, where d is the margin of error. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 8 / 31

Simple random sampling How to select a simple random sample of size n from the finite population? Draw-by-draw procedure Rejective Bernoulli sampling method Sample Reservoir method Random sorting method Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 9 / 31

Simple random sampling Draw-by-draw procedure For example, consider U = {1, 2,, N} and n = 2. In the first draw, select one element with equal probability. In the second draw, select one element with equal probability from U {a 1 } where a 1 is the element selected from the first draw. Let a 2 be the element selected from the second draw. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 10 / 31

Simple random sampling Draw-by-draw procedure (Cont d) P(a 1, a 2 ) = P(a 1 )P(a 2 U {a 1 }) + P(a 2 )P(a 1 U {a 2 }) = = 2 N(N 1). We can prove similar results for general n. (Use mathematical induction). Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 11 / 31

Simple random sampling Rejective Bernoulli sampling method 1 Apply Bernoulli sampling of expected size n. where f = n/n. I 1,, I N Bernoulli(f ) 2 Check if the realized sample size is n. If yes, accept the sample. Otherwise, goto Step 1. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 12 / 31

Simple random sampling Rejective Bernoulli sampling method (Cont d) Justification: ( P I 1, I 2,, I N ) N I i = n i=1 = = 1 ( N n) N i=1 f I i (1 f ) 1 I i ( N ) n f n (1 f ) N n if N i=1 I i = n. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 13 / 31

Simple random sampling Reservoir method (McLeod and Bellhouse, 1983) 1 The first n units are selected into the sample. 2 For each k = n + 1,, N: 1 Select k with probability n/k. 2 If unit k is selected, remove one element from the current sample with equal probability. 3 Unit k takes the place of the removed unit. Note that the population size is not necessarily known. You can stop any time point of the process then you will obtain a simple random sample from the finite population considered up to that time point. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 14 / 31

Simple random sampling Random sorting method 1 A value of an independent uniform variable in [0,1] is allocated to each unit of the population. 2 The population is sorted in ascending (or descending) order. 3 The first n units of the sorted population are selected in the sample. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 15 / 31

SRS with replacement 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 16 / 31

SRS with replacement In with-replacement sampling, order of the sample selection is important. Ordered sample OS = (a 1, a 2,, a n ) where a i is the index of the element in the i-th with-replacement sampling. Sample: A = {k; k = a i for some i, i = 1, 2,, m} SRS with replacement: For each i-th draw, we use a i = k with probability 1/N, k = 1,, N. Sample size is random variable: Note that π k = Pr (k A) = 1 Pr (k / A) ( = 1 1 1 ) n N Thus, n 0 = N k=1 π k = N N ( 1 N 1) n n for n > 2. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 17 / 31

SRS with replacement 1 First, define Z i = y ai = N y k I (a i = k). k=1 Note that Z 1,, Z n are independent random variables since the n draws are independent. 2 Z 1,, Z m are identically distributed since the same probabilities are used at each draw, where E (Z i ) = Ȳ and V (Z i ) = N 1 N k=1 ( yk Ȳ ) 2 σ 2 y. 3 Thus, Z 1,, Z m are IID with mean Ȳ and variance σ2 y. Use z = n k=1 Z k/n to estimate Ȳ. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 18 / 31

SRS with replacement Estimation of Total Unbiased estimator of Y : Variance V (ŶSRSWR ) = N2 n Ŷ SRSWR = N n n y ai = Nȳ n. i=1 ( 1 1 ) S 2 = N2 N n σ2 y V (Ŷ SRS ) where S 2 = (N 1) 1 N i=1 (y i Ȳ N ) 2 = N(N 1) 1 σ 2 y. Variance estimation ˆV (ŶSRSWR ) = N2 n s2 where s 2 = (n 1) 1 n i=1 (y a i ȳ n ) 2. Note that E(s 2 ) = σ 2 y. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 19 / 31

Systematic sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 20 / 31

Systematic sampling Setup: 1 Have N elements in a list. 2 Choose a positive integer, a, called sampling interval. Let n = [N/a]. That is, N = na + c, where c is an integer 0 c < a. 3 Select a random start, r, from {1, 2,, a} with equal probability. 4 The final sample is A = {r, r + a, r + 2a,, r + (n 1)a}, if c < r a = {r, r + a, r + 2a,, r + na}, if 1 r c. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 21 / 31

Systematic sampling Sample size can be random { n if c < r a n A = n + 1 if r c Inclusion probabilities π k = π kl = Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 22 / 31

Systematic sampling Remark This is very easy to do. This is a probability sampling design. This is not measurable sampling design: No design-unbiased estimator of variance (because only one random draw) Pick one set of elements (which always go together) & measure each one: Later, we will call this cluster sampling. Divide population into non-overlapping groups & choose an element in each group: closely related to stratification. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 23 / 31

Systematic sampling Estimation Partition the population into a groups where U i : disjoint Population total where t r = k U r y k. Y = i U U = U 1 U 2 U a y i = a r=1 k U r y k = a r=1 Think of finite population with a elements with measurements t 1,, t a. t r Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 24 / 31

Systematic sampling Estimation (Cont d) HT estimator: if A = U r. Ŷ HT = t r 1/a, Variance: Note that we are doing SRS from the population of a elements {t 1,, t a }. ) ( Var (ŶHT = a2 1 1 ) St 2 1 a where S 2 t = 1 a 1 and t = a r=1 t r /a. When the variance is small? a (t r t) 2 r=1 Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 25 / 31

Systematic sampling Estimation (Cont d) Now, assuming N = na V (ŶHT ) = a (a 1) St 2 a = n 2 a (ȳ r ȳ u ) 2 r=1 where ȳ r = t r /n and ȳ u = t/n. ANOVA: U = a r=1 U r SST = a (y k ȳ u ) 2 = (y k ȳ u ) 2 k U r=1 k U r a a = (y k ȳ r ) 2 + n (ȳ r ȳ u ) 2 r=1 k U r r=1 = SSW + SSB. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 26 / 31

Systematic sampling V (ŶHT ) = na SSB = N SSB = N (SST SSW ). If SSB is small, then ȳ r are more alike and V If SSW is small, then V (ŶHT ) is large. (ŶHT ) is small. Intraclass correlation coefficient ρ measures homogeniety of clusters. ρ = 1 n SSW n 1 SST More details about ρ will be covered in the cluster sampling. (Chapter 6). Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 27 / 31

Systematic sampling Comparison between systematic sampling (SY) and SRS How does SY compare to SRS when the population is sorted by the following way? 1 Random ordering: Intuitively should be the same 2 Linear ordering: SY should be better than SRS 3 Periodic ordering: if period = a, SY can be terrible. 4 Autocorrelated order: Successive y k s tend to lie on the same side of ȳ u. Thus, SY should be better than SRS. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 28 / 31

Systematic sampling How to quantify? : V SRS (ŶHT ) V SY (ŶHT ) = N2 n = n 2 a ( 1 n ) 1 N N 1 a (ȳ r ȳ u ) 2 r=1 N ( ) 2 yk Ȳ N k=1 Cochran (1946) introduced superpopulation model to deal with this problem. (treat y k as a random variable) Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 29 / 31

Systematic sampling Example: Superpopulation model for a population in random order. Denote the model by ζ: {y k } iid ( µ, σ 2) E ζ { V SRS (ŶHT )} E ζ { V SY (ŶHT )} = N2 n = N2 n ( 1 n ) σ 2 N ( 1 n ) σ 2 N Thus, the model expectations of the design variances are the same under the IID model. Kim Ch. 3: Element sampling design: Part 1 Fall, 2014 30 / 31