Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan. 16, 2018 Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 1 / 36

Outline 1 Introduction 2 A brief review of some sampling designs 3 Bootstrap methods for complex sampling designs 4 Simulation studies 5 Conclusions Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 2 / 36

Introduction Outline 1 Introduction 2 A brief review of some sampling designs 3 Bootstrap methods for complex sampling designs 4 Simulation studies 5 Conclusions Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 3 / 36

Introduction Introduction Bootstrap is popular. Easy to implement, Higher accuracy compared with the Wald-type method (Hall, 1992, 3.3). Classical bootstrap method is not applicable under most sampling designs. Rao and Wu (1988) discussed a rescaling bootstrap method under stratified random sampling. Sitter (1992) considered a mirror-match bootstrap method for sampling designs without replacement. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 4 / 36

Introduction Introduction (Cont d) The goal of this study. Propose bootstrap methods for three commonly used sampling designs: Poisson sampling, simple random sampling (SRS) and probability proportional to size (PPS) sampling. Study the theoretical properties of the proposed method. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 5 / 36

A brief review of some sampling designs Outline 1 Introduction 2 A brief review of some sampling designs 3 Bootstrap methods for complex sampling designs 4 Simulation studies 5 Conclusions Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 6 / 36

A brief review of some sampling designs Sampling designs Finite population F N = {y 1,..., y N } with a known size N. Parameter of interest Y = N i=1 y i (or Ȳ = N 1 Y equivalently). For Poisson sampling and SRS, Denote I i to be the sample indicator. Denote π i = E(I i ) to be the first-order inclusion probability. For Poisson sampling, a sample is obtained based on N independent Bernoulli trials. That is, I i Ber(π i ). Denote n 0 = N i=1 π i. For SRS, a without-replacement sample of size n is selected with equal probabilities. That is, π i = nn 1. Denote ŶPoi = N i=1 I iπ 1 i y i to be the Horvitz Thompson estimator of Y under Poisson sampling, and we can define Ŷ SRS similarly. Denote ˆV Poi and ˆV SRS to be the Horvitz Thompson variance estimators for the two designs. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 7 / 36

A brief review of some sampling designs Sampling designs (Cont d) For PPS sampling, Let p i (0, 1) be the selection probability of y i with N i=1 p i = 1. A sample of size n is obtained by independently selecting a single element from the same finite population for n times. Denote ŶPPS = n 1 n i=1 z i to be the the Hansen Hurwitz estimator of Y, where z i = p 1 a,i y a,i, a i is the index of the selected element for the i-th draw, p a,i = p k and y a,i = y k if a i = k. Denote ˆV PPS to be the design-unbiased estimator (Fuller, 2009; 1.2.5) Denote T Poi = ˆV 1/2 Poi (Ŷ Poi Y ) for Poisson sampling, and we can have T SRS and T PPS defined similarly. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 8 / 36

Bootstrap methods for complex sampling designs Outline 1 Introduction 2 A brief review of some sampling designs 3 Bootstrap methods for complex sampling designs 4 Simulation studies 5 Conclusions Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 9 / 36

Bootstrap methods for complex sampling designs Bootstrap method for Poisson sampling 1 Based on the current sample of size n, generate (N 1,..., N n) by a multinomial distribution MN(N; ρ), where ρ = (ρ 1,, ρ n ) and ρ i π 1 i. 2 For each i = 1,, n, generate m i independently by a binomial distribution Bin(N i, π i). The bootstrap sample consists of m i replicates of y i under Poisson sampling. 3 Repeat the two steps above independently for M times. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 10 / 36

Bootstrap methods for complex sampling designs Theoretical results for Poisson sampling Denote (F N, B N, P N,Poi ) to be a probability space, where B N is the power set of F N, P N,Poi ( ) is a probability measure on F N associated with Poisson sampling. For any positive integer set J N +, denote P J,Poi = j J P j,poi to be the product probability measure on the product space j J F j. It can be shown that there exists a probability measure P Poi on U = N=1 F N equipped with the product σ-algebra B, such that P J,Poi = P Poi ξ 1 J for all finite positive integer set J N +, where ξ J is the canonical projection from U to j J F j (Klenke, 2014, 14.1). Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 11 / 36

Bootstrap methods for complex sampling designs Theoretical results for Poisson sampling (Cont d) Lemma (Lemma 3.1) Under mild conditions, we have where V Poi = N i=1 π 1 i (1 π i )y 2 i, lim sup N (n 0 N 2 V Poi ) = O(1), n0 2 µ (3) N 3 Poi = O(1), n 0 ( ˆV N 2 Poi V Poi ) 0 a.s. (P Poi ), n0 2 (ˆµ (3) N 3 Poi µ(3) Poi ) = o p(1), µ (3) Poi = N i=1 y 3 i (1 π i){(1 π i ) 2 π 2 i 1}, ˆµ (3) Poi = n i=1 π 1 i yi 3(1 π i){(1 π i ) 2 π 2 i 1}. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 12 / 36

Bootstrap methods for complex sampling designs Theoretical results for Poisson sampling (Cont d) Theorem (Theorem 3.1) Under mild conditions, we have Furthermore, ˆF Poi (z) = Φ(z) + ˆµ (3) Poi ˆV 3/2 Poi = O p (n 1/2 0 ). (1) ˆµ(3) Poi 6 ˆV 3/2 Poi (1 z 2 )φ(z) + o p (n 1/2 0 ) (2) a.s. (P Poi ) for z R, where ˆF Poi (z) is the cumulative distribution function of T Poi = ˆV 1/2 Poi (Ŷ Poi Y ) under Poisson sampling, Φ(z) is the cumulative distribution function of the standard normal distribution with the density function φ(z). Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 13 / 36

Bootstrap methods for complex sampling designs Theoretical results for Poisson sampling (Cont d) Theorem (Theorem 3.2) Under mild conditions, we have ˆF Poi (z) = Φ(z) + ˆµ(3) Poi 6 ˆV 3/2 Poi (1 z 2 )φ(z) + o p (n 1/2 0 ) (3) a.s. conditional on the sample {y 1,..., y n } obtained by Poisson sampling in probability for z R, where ˆF Poi (z) is the cumulative distribution function of TPoi conditional on the realized sample. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 14 / 36

Bootstrap methods for complex sampling designs Bootstrap method for SRS 1 Generate (N1,..., N n) by MN(N; ρ), where ρ i = n 1. 2 Generate a bootstrap sample of size n from FN using SRS. 3 Repeat the two steps above independently for M times. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 15 / 36

Bootstrap methods for complex sampling designs Theoretical results for SRS Lemma (Lemma 4.1) Under mild conditions, we have where σ 2 SRS = N 1 N i=1 (y i Ȳ )2, µ (3) SRS = N 1 N i=1 (y i Ȳ ) 3, lim sup N σ 2 SRS = O(1), µ (3) SRS = O(1), s 2 SRS σ2 SRS 0 a.s. (P SRS), ˆµ (3) SRS µ(3) SRS = o p(1), ˆµ (3) SRS = n 1 n i=1 y i 3 + 2ȳn 3 3ȳ n n 1 n i= y i 2, ȳ n = n 1 n i=1 y i. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 16 / 36

Bootstrap methods for complex sampling designs Theoretical results for SRS (Cont d) Theorem (Theorem 4.1) Under mild conditions, we have ˆF SRS (z) = Φ(z) + (1 2nN 1 )ˆµ (3) SRS 6{n(1 nn 1 )} 1/2 ssrs 3 (1 z 2 )φ(z) + o p (n 1/2 ) (4) a.s. (P SRS ) for z R, where ˆF SRS (z) is the cumulative distribution function of T SRS under SRS, and recall that T SRS = ˆV 1/2 SRS (Ŷ SRS Y ). Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 17 / 36

Bootstrap methods for complex sampling designs Theoretical results for SRS (Cont d) Theorem (Theorem 4.2) Under mild conditions, we have ˆF SRS (z) = Φ(z) + (1 2nN 1 )ˆµ (3) SRS 6{n(1 nn 1 )} 1/2 ssrs 3 (1 z 2 )φ(z) + o p (n 1/2 ) (5) a.s. conditional on the sample {y 1,..., y n } obtained by SRS in probability for z R, where ˆF SRS (z) is the cumulative distribution function of T SRS conditional on the realized sample. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 18 / 36

Bootstrap methods for complex sampling designs Bootstrap method for PPS 1 Obtain (N a,1,..., N a,n) by a multinomial distribution MN(N; ρ), where ρ i p 1 a,i. 2 Based on FN, sample one element with selection probability (CN ) 1 pi for the i-th element independently for n times, where C N = N i=1 p i = n i=1 N a,i p a,i. 3 Repeat the two steps above independently for M times. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 19 / 36

Bootstrap methods for complex sampling designs Theoretical results for PPS Lemma (Lemma 5.1) Under mild conditions, we have where lim sup N (N 2 σ 2 PPS ) = O(1), N 3 µ (3) PPS = O(1), N 2 (s 2 PPS σ2 PPS ) 0 a.s. (P PPS), σ 2 PPS = N i=1 p i(p 1 i y i Y ) 2, N 3 (ˆµ (3) PPS µ(3) PPS ) = o p(1), µ (3) PPS = N i=1 p i(p 1 i y i Y ) 3, s 2 PPS is the sample variance of {z i : i = 1,..., n} with z i = p 1 a,i y a,i, ˆµ (3) PPS = n 1 n i=1 z3 i + 2 z n 3 3 z n n 1 n i= z2 i. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 20 / 36

Bootstrap methods for complex sampling designs Theoretical results for PPS (Cont d) Theorem (Theorem 5.1) Under mild conditions, we have ˆF PPS (z) = Φ(z) + ˆµ(3) PPS 6 nspps 3 (1 z 2 )φ(z) + o p (n 1/2 ) (6) a.s. (P PPS ), where ˆF PPS is the cumulative distribution function of T PPS = ˆV 1/2 PPS (Ŷ PPS Y ) under PPS sampling. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 21 / 36

Bootstrap methods for complex sampling designs Theoretical results for PPS (Cont d) Theorem (Theorem 5.2) Under mild conditions, we have ˆF PPS (z) = Φ(z) + ˆµ(3) PPS 6 nspps 3 (1 z 2 )φ(z) + o p (n 1/2 ) (7) a.s. conditional on the sample obtained by PPS sampling in probability for z R, where ˆF PPS (z) is the conditional distribution of T PPS given the realized sample. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 22 / 36

Simulation studies Outline 1 Introduction 2 A brief review of some sampling designs 3 Bootstrap methods for complex sampling designs 4 Simulation studies 5 Conclusions Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 23 / 36

Simulation studies Single-stage sampling designs A finite population F N = {y 1,..., y N } is generated by y i Ex(10), N=500, Ex(λ) is an exponential distribution with a scale parameter λ. Size measure is simulated by z i = log(3 + s i ) with s i y i Ex(y i ). The expected sample size is n 0 {10, 100}. Goal: Construct 90% confidence interval for Ȳ under Poisson sampling with π i z i and N i=1 π i = n 0, SRS with sample size n 0, PPS sampling with p i z i and the sample size n 0. Denote Ỹ to be the design-unbiased estimate of Ȳ under a specific sampling design with variance estimator Ṽ. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 24 / 36

Simulation studies Single-stage sampling designs (Cont d) We consider two methods to obtain the 90% confidence interval. Proposed method by setting M = 1, 000, that is, (Ỹ q B,0.95Ṽ 1/2, Ỹ q B,0.05Ṽ 1/2 ), where q B,p is the p-th quantile of { T (m) : m = 1,..., M}, T (m) = (Ṽ (m) ) 1/2 (Ỹ (m) Ȳ (m) ). Ṽ (m), Ỹ (m) and Ȳ (m) are the quantities in the m-th resampling. Wald-type method, that is, where q p = Φ 1 (p). (Ỹ q 0.95 Ṽ 1/2, Ỹ q 0.05 Ṽ 1/2 ), 1, 000 Monte Carlo simulations are conducted for each sampling design. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 25 / 36

Simulation studies Single-stage sampling designs (Cont d) Design Poisson SRS PPS Method n 0 = 10 n 0 = 100 C.R. C.L. C.R. C.L. Bootstrap 0.89 15.2 0.89 3.6 Wald-type 0.83 11.9 0.87 3.5 Bootstrap 0.86 11.3 0.90 2.8 Wald-type 0.82 8.9 0.90 2.8 Bootstrap 0.88 10.3 0.90 2.6 Wald-type 0.82 7.5 0.89 2.5 Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 26 / 36

Simulation studies Two-stage sampling designs A finite population F N = {y i,j : i = 1,..., H; j = 1,..., N i } is generated by y i,j = 50 + a i + e i,j, a i N(0, 50), e i,j Ex(20), N i a i Po(q i ) + c 0 where H = 100, Po(λ) is a Poisson distribution with a rate parameter λ, q i = (a i 25) 2 /20, c 0 = 40 is the minimum cluster size The finite population size is N = 17, 011. The cluster sizes range from 40 to 542. We assume that N and N 1,..., N H are known. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 27 / 36

Simulation studies Two-stage sampling designs (Cont d) Goal: Construct 90% confidence interval for Ȳ and P = N 1 H Ni i=1 j=1 δ (,q y )(y i,j ). We consider two different sampling designs for the first stage. Poisson sampling with π i N i and N i=1 π i = n 1, PPS sampling with p i z i and the sample size n 1. We use SRS as the second-stage sampling design with sample size n 2 for each sampled cluster. We consider n 1 {10, 30} and n 2 {10, 30}. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 28 / 36

Simulation studies Two-stage sampling designs (Cont d) We consider two methods to obtain the 90% confidence interval. The proposed method extended to a two-stage sampling design with M = 500. That is, use the following two steps to bootstrap the finite population. 1 Use the proposed method to bootstrap the H clusters by treating them as elements, and the original sample within each selected cluster are replicated accordingly. 2 For each bootstrap cluster, apply the proposed method to bootstrap the cluster finite population independently. Wald-type method, and it is the same as the one discussed before. 500 Monte Carlo simulations for each sampling design. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 29 / 36

Simulation studies Two-stage sampling designs (Cont d) Table : Coverage rate and length of the 90% confidence interval for Ȳ. Design Poisson PPS n 2 = 10 n 2 = 30 n 2 = 10 n 2 = 30 Method n 1 = 10 n 1 = 30 C.R. C.L. C.R. C.L. Bootstrap 0.91 74.8 0.90 34.4 Wald-type 0.89 69.6 0.90 33.9 Bootstrap 0.90 74.5 0.90 34.2 Wald-type 0.89 69.4 0.90 33.6 Bootstrap 0.90 9.1 0.91 4.8 Wald-type 0.87 8.0 0.90 4.7 Bootstrap 0.90 7.8 0.89 4.1 Wald-type 0.86 6.8 0.88 4.0 Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 30 / 36

Simulation studies Two-stage sampling designs (Cont d) Table : Coverage rate and length of the 90% confidence interval for P. Design Poisson PPS n 2 = 10 n 2 = 30 n 2 = 10 n 2 = 30 Method n 1 = 10 n 1 = 30 C.R. C.L. C.R. C.L. Bootstrap 0.91 0.6 0.89 0.3 Wald-type 0.87 0.5 0.88 0.3 Bootstrap 0.88 0.6 0.90 0.2 Wald-type 0.87 0.5 0.91 0.2 Bootstrap 0.89 0.3 0.90 0.2 Wald-type 0.84 0.2 0.90 0.2 Bootstrap 0.90 0.3 0.90 0.1 Wald-type 0.85 0.2 0.89 0.1 Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 31 / 36

Simulation studies Remark for the simulation studies For the two-stage sampling designs, The sampling distribution of Ỹ is approximately symmetric under both designs even when the sample size is small. The sampling distribution of the proportion estimator is slightly right-skewed when n 1 = 10. We have compared the proposed method with the nonparametric Bayesian bootstrap method (Dong et al., 2014) and that based on the two-step inverse sampling method (Sverchkov and Pfeffermann, 2004), and the proposed one works better in terms of the coverage rate. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 32 / 36

Conclusions Outline 1 Introduction 2 A brief review of some sampling designs 3 Bootstrap methods for complex sampling designs 4 Simulation studies 5 Conclusions Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 33 / 36

Conclusions Conclusions We propose a bootstrap method for Poisson sampling, SRS and PPS sampling, and we show that the proposed method is second-order accurate. It is necessary to estimate the variance of the design-unbiased estimator since the proposed method is based on an asymptotically pivotal statistic. Although the proposed method is discussed under the single-stage sampling designs, simulation shows that it works well under some two-stage sampling designs. It may be extended to other complex sampling designs when the asymptotic distribution of the design-unbiased estimator exists, but the second-order accuracy may not be guaranteed. The proposed method can be easily parallelized in practice. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 34 / 36

Conclusions Selected reference Dong, Q., Elliott, M. R. & Raghunathan, T. E. (2014). A nonparametric method to generate synthetic populations to adjust for complex sampling design features. Surv. Methodol. 40, 29 46. Fuller, W. A. (2009). Sampling Statistics. Hoboken: John Wiley. Hall, P. (1992). The Bootstrap and Edgeworth Expansion. New York: Springer Science & Business Media. Klenke, A. (2014). Probability Theory: A Comprehensive Course. 2nd edition. London: Springer Verlag London Ltd.. Rao, J. N. K. & Wu, C. F. J. (1988). Resampling inference with complex survey data. J. Amer. Statist. Assoc. 83, 231 241. Sitter, R. R. (1992). A resampling procedure for complex survey data. J. Amer. Statist. Assoc. 20, 755 765. Sverchkov, M.& Pfeffermann, D. (2004). Prediction of finite population totals based on the sample distribution. Surv. Methodol. 30, 79 92. Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 35 / 36

Conclusions Thank you! Zhonglei Wang Bootstrap for complex sampling Jan. 16, 2018 36 / 36